Frameworks for live video processing with CUDA?

Would anyone know frameworks that allow for easy plug and play of (possibly CUDA accelerated) modules into a live video stream - ideally open source and extensible? OpenCV seems a little too academic and overengineered for this purpose.

I’d like to do things such as

-foreground/background separation: green screen emulation (e.g. place a webcam user in front of an arbitrary background) even though the background isn’t really a green screen.

-motion vector extraction, rough estimation of face or user position

-overlay DirectX or OpenGL rendered content, possibly interacting with the user’s motion (e.g. floating objects possibly animated with a physics package such as Bullet Physics, PhysX).

Think generating the illusion of a webcam user being aboard the ISS, being able to push floating objects such as pencils or transparent waterbubbles out of the way. How’s that for some fun with CUDA.

What you propose is perfectly feasible. That is what I do with my web cam input stream, CUDA texture processing and DirectX presentation. But, I do it the brute force way through Direct X and Windows programming with calls to CUDA and transfering bitmaps back and forth between CPU/Display memory and CUDA memory, with some Logitech webcam API magic thrown in. That’s too involved to go into here, but I just wanted to say that what you want to do is possible since I do that every day, more or less.

Ken Chaffin

Agreed, definitely possible - but looking for open source / free frameworks that do this, I’ve come across nothing…

Qt’s Phonon framework has some experimental patches that allow Effects to be added to a video pipe, but they’re linux-only… and I don’t know if there’s any official plans to make that feature official…

DirectShow filters can do what you want I do believe, but I’ve had little/no experience developing any DirectShow related plugins…

Here at work we have our own camera/video framework written from scratch which has sources/sinks that make up what we call ‘video pipes’ then we have various processors you can attach into that… It seems like a very standard practice, so I’m surprised there’s no APIs around that do this properly (might be an idea for a hobby project).

Yeah, but writing a new API from scratch just to fool a few people on chat roulette seems a bit overboard. ;)

Wouldn’t be nearly as much work if you built on top of a video display / camera API (eg: Qt/Phonon, DirectShow, etc).

Just a matter of abstracting video sources/sinks, and creating a ‘video pipe’ concept that has a source & sink, and a set of ‘processors’ which get access to each frame before being sent to the sink…

… in theory :P

I think there are a couple of tricky parts:

a) piping the processed video image back into a driver that emulates a web cam (or provides a video capture source).

b) keeping the image data in the CUDA device memory during a multi-stage processing. It would certainly be ineffective if all data was copied from host to device memory and back for every stage of the processing pipeline.

Video capturing is easy, I’ve used this package before:

I should have mentioned that I do use DirectShow for my video filter graph implementation that results in frame capture and event notification. In retrospect, I only use the Logitech API for camera control. I have two OrbitCams
with controllable tilt, pan and focus. I run both into the PC and can capture simultaneously from both, but most of the time I only used one at a time.

This may not be the best way to do things, as I usually resort to brute force, straight forward approaches, but I capture each frame of video into a texture, copy the bitmap to CUDA device, do as many kernel convolutions as needed to process that frame using a bitmap buffer output, and then I copy the bitmap output buffer back to the host, into the the DirectX texture and then display the texture on a planar surface in 3D space. I am able to to these at full resolution and full frame rate (well at least in the 60FPS range). I am using an NVIDIA Quadro FX 3700 display card and even though I have 3 Tesla C1060 cards in my machine, I so far have only used one of those at a time. So, I do have quite a bit of horsepower, but don’t rule out shuttling big bitmaps back and forth from host to device. It works very well. I do not use DirectX nor OpenGL interoperative capabilities, I just brute force everything with my own bitmap arrays.

One fun thing I have done recently is implement a true 3D version of Conway’s Life cellular automata simulation, which I run with a texture size of 256x256 and 256 levels of depth, so I display a tomographic representation of 3D objects as a cube of 16 million pixels. It’s pretty fun to watch. I have to admit I really start to bog down in frame rate on this, but still, I am shuttling 256 bitmaps per frame between host and device and back. It’s pretty fun to fly the camera through the 3D cube while it is running, or pause it and fly around in the 3D space. That is why I was saying that what you want to do with interactive, realtime 2D or 3D imagery is very possible.

That is the nice thing about programming, say in C++/C with few APIs; if you can imagine it, you can usually do it. But I’m sure you know that. It does take a lot of time to get these things to work though.

The CUDA parallel kernel architecture lends itself excellently to buffered convolution kernel video image processing. With alpha channel blending of images and depth buffered 3D (all via DirectX) , overlaying images in many ways is very easy, on the host side.

Now finding an open source framework to encapsulate all of this will be fun.

By the way, I do all of this as a hobby, not as a profession. Just for the fun and challenge.

Ken Chaffin

I remember a project called cudacv or gpucv… forgot what name… But I heard of it recently… You may want to check out.

I think it is gpucv…