I’m working with HD (1920x1080) imagery, and you have to understand I/O bounds to get the problem (and solution) here. 1920x1080 pixels x 3 bytes/pixel (1 byte per RGB) x 30 frames per second =~ 180 MB/sec. No single (non-SSD) hard drive can support this rate (much less GB ethernet). In order to get to realistic rates of what the CameraLink (up to 6 Gb/sec) or HD-SDI (1.5 Gb/sec) input can support, you need to pull the whole video stream into RAM, then move the frames over to the graphics card one at a time to emulate the ‘real’ video stream rate. Solid-State hard drives have better I/O rates, and you may be able to directly emulate an HD source if you read the data in from one of those instead.
That being said, once I pull the data into host (CPU) memory, I can emulate ‘live’ camera data at 200+ frames/sec.
Hope this answer’s the question you’re asking. Don’t know whether you’ll have the RAM needed to load all 500 frames in memory (2 MP x 3 Bpp x 500 frames = 3 GB, requires 64-bit OS for that much single process RAM).
BTW, why bring it back out to the CPU? You can render in OpenGL directly from the card with the CUDA interop? I’m doing simple stuff, so I didn’t need to rewrite my code to accomodate the interop. However, if you’re modifying existing OpenGL code to use CUDA for part of it, I can understand it may be difficult to modify.