is it possible to do real-time processing of what is being sent to my monitor?

Hello

I am (very) new to gpu computing so apologies in advance if this question is in the wrong place, incomplete, or otherwise nonsensical.

My basic question is (how) can I do real-time image processing of what is displayed on my monitor? For example: I have a face detection program that I would like to run on a movie I am watching and have it draw a box around all the faces it finds as the movie is playing. So then when watching the movie I would see this altered version (one where every face has a box around it). I understand that if I just wanted to see the movie with the boxes I could open the file, process it, save it and then watch the processed file but that isn’t the goal here.

I’ve read through a couple older (mid-2000s) papers on real-time video processing and they mention that the bottleneck is transferring the image to the graphics card. Since I should already have the image on the card (in some sense) my naive initial idea is to have two cards connected with an SLI and use the first card as “normal” to create the image but instead of sending it directly to the monitor instead send it over the SLI connection to the second card which would then do the processing and send the finished result to the monitor. Is such a thing possible or even necessary or is there a better approach all-together?

My goal of processing and manipulating the output of a graphics card seems pretty clear but I am not sure how to get started. Ideally it would be great if someone has done what I’m talking about and can suggest both required hardware and reading materials. Or, just as good I suppose, if someone knows what I would like to do isn’t possible and can explain why. I am very new to the whole idea of interacting with graphics cards in any matter, CUDA or otherwise and fully realize this is likely a case of trying to run before I can crawl. Basically I am in the market for a new computer and since this idea has a lot of applications I am looking to explore I would gladly build a machine that has the required hardware necessary if only I knew what that was. However I would very much appreciate some reading material in addition to hardware recommendations.

If anything above is unclear please let me know and I will attempt to clarify my intent. And if this is entirely the wrong place to ask please let me know where would be better. I considered the SLI forum but since my question is more general I thought this would be a better place.

Thank you for your help.

Hello,

Why don’t you try with a simpler task.Record a small clip with your cam and see if you can do the tasks. After you are sure it works you can add the real time processing.

CUDA programs actually don’t have any direct access to the display framebuffer, so you can’t easily capture the raw pixels on the display.

However, CUDA and OpenGL (or DirectX) do have the ability to share chunks of GPU memory, so if you were to decode the video to an OpenGL buffer object, you could also map that object into the CUDA address space and manipulate it there. After modifying the buffers in CUDA, the data would be available for drawing on the screen using OpenGL. I don’t use OpenGL, so this is just my vague understanding from the bits of the documentation I’ve looked at…

As far as we can discern, the SLI cable is used for synchronization between cards, but is not a high speed data transfer link. Large amounts of data still have to be moved over the PCI-Express bus. Fortunately, PCI-Express 2.0 is now very fast compared to what was commonly available in the mid-2000s. It is quite possible to achieve speeds of 5-6 GB/sec between the CPU and the GPU, and probably just as fast device-to-device. For comparison, completely uncompressed, 32-bits per pixel, 1080p video at 30 frames-per-second requires moving 250 MB/sec from the decoder to the display, so moving a couple copies of a video stream between devices should be quite doable. (A larger concern is actually latency and the predictability of that latency. I have little knowledge of that since I don’t work on realtime problems.)

My suggestion would be to first try solving this problem on a single card, since there might be sufficient processing power available to do all the calculation you want between frames. If not, then perhaps going to a second GPU would be required.

Chapter 12 of Rob Farber’s new book: Farber 2011 looks like a good place to start.

Here’s a link to a live demo of what’s built in Chapter 12: Realtime interaction.

In that video, he’s interacting with a live feed from a webcam, but a movie file can also serve as the input stream.

Hi

Thank you seibert for your information. The fact that CUDA doesn’t have access to the framebuffer is very helpful as is the inability to use SLI as a high-speed transfer mechanism. Based on that and what you say about the improved bus speeds I think your suggestion of just starting with one card is a good one. That helps quite a bit in budgeting for the new machine which is my most immanent problem. Thanks.

And thank you nnunn for that reading suggestion. That video is certainly impressive and I was unaware of that book but it appears to cover a great deal that I am interested in. I’ll have to pick up a copy.

In both your responses (and from a little research I did over the weekend) it seems that the CUDA programs only have access to buffer’s I create correct? For instance, and I realize I wasn’t clear about this when I wrote my original post, a key part of what I’d like to do involves interaction with an arbitrary 3rd party window. I’d like to start with the movie example because I already made a non-gpu implementation of face detection and thought I good first-project would be to transfer it to the gpu and run it on arbitrary video real-time. Ultimately what I’d like to do in more along the lines of trying to apply computer vision to human-computer interaction in terms of how people interact with 3rd party apps.

So to carry my initial example a little further say I wanted a face-detection program that would work on a window containing video from an arbitrary player (for instance the vlc desktop media player as well as a youtube video played in your favorite browser). In this case instead of trying to look at the entirety of the display framebuffer (which is off limits) I would have a process or window handle to work from. Does this improve the situation at all or am I still back to doing some sort of screen capture? If it doesn’t help then I can look into how screen capturing programs (like FRAPS or Camtasia) work and perhaps use one of them as the input stream for the example nnunn posted.

Thank you both for your help.

Screen capturing programs would rely on windows providing that info… And using that info would mean - copying data to GPU from CPU. Camtasia can get real slow especially for real-time feeds. I dont think it is designed for anything real-time.

It might be possible if you write a Windows Kernel Module (something like a filter driver or mini-port driver on top of existing graphics driver) - which can get all info on where the screen data is on the GPU. IF you can copy that data to a OpenGL buffer object - that would be very fast! (like say 140 to 200GB/s - assuming a intra-device copy). But this looks kinda complicated… I cant even guarantee feasibility…Just a wild thought! Windows may be very touchy when it comes to graphics drivers…They may not even permit…

You can also check out how remote desktop programs like VNC works. That might be of help!

btw, processing screen real-time using data in GPU – THat sounds like a coool idea! Good luck! I will be watching this thread.

Yes, I think there might be issues with this approach too. However it may be that I can use this approach for smaller, proof-of-concept type work while I look into the ideas you and others have mentioned. Since my interests really lie in learning/vision/hci I want to be able to devote at least some time to that.

Thanks for that idea, I’ll have to look into it. Do you have any resources you could suggest for how to do such a thing? I’ve done a bit of searching and found a couple things about getting started (like http://yz.mit.edu/wp/getting-started-with-windows-kernel-development/ and http://www.catch22.net/tuts/kernel101 ) but since this is well into uncharted waters for me if you have any suggestions I’d appreciate them.

That’s also a good thought. I was discussing this with a colleague and she suggested checking out the virtualized gpu access that the vm hypervisor folks are apparently working on. So much to learn…

Glad you think so too. I think it has a great deal of potential and will try and report back if I manage to get anything that comes close to working.

Yes, I think there might be issues with this approach too. However it may be that I can use this approach for smaller, proof-of-concept type work while I look into the ideas you and others have mentioned. Since my interests really lie in learning/vision/hci I want to be able to devote at least some time to that.

Thanks for that idea, I’ll have to look into it. Do you have any resources you could suggest for how to do such a thing? I’ve done a bit of searching and found a couple things about getting started (like http://yz.mit.edu/wp/getting-started-with-windows-kernel-development/ and http://www.catch22.net/tuts/kernel101 ) but since this is well into uncharted waters for me if you have any suggestions I’d appreciate them.

That’s also a good thought. I was discussing this with a colleague and she suggested checking out the virtualized gpu access that the vm hypervisor folks are apparently working on. So much to learn…

Glad you think so too. I think it has a great deal of potential and will try and report back if I manage to get anything that comes close to working.

Hi Robert,

I think you could use Camtasia (or) Windows media encoder (or) Camstudio to produce a screencast and start working on the output video to get a headstart.
If you can, get an assistant to research on the real-time thing on Windows. Let him/her write that filter driver or whatever it takes to do that. This way you can make progress in your research.

I have little idea about the windows driver (or) hypervisor thing that you mentioned. I am in the same boat as you.
Wish you good luck!
I hope you can get a nice PoC out of this.
And, please do update your results here. I will be very interested to know.

Thanks,
Best Regards,
Sarnath

Hi Robert,

I think you could use Camtasia (or) Windows media encoder (or) Camstudio to produce a screencast and start working on the output video to get a headstart.
If you can, get an assistant to research on the real-time thing on Windows. Let him/her write that filter driver or whatever it takes to do that. This way you can make progress in your research.

I have little idea about the windows driver (or) hypervisor thing that you mentioned. I am in the same boat as you.
Wish you good luck!
I hope you can get a nice PoC out of this.
And, please do update your results here. I will be very interested to know.

Thanks,
Best Regards,
Sarnath