We need to mosaic and do some image clean-up on thirty video streams. (as well as do a little bit of object tracking). We are comfortable with the RANSAC in the CUDA, we worry if this is this architecture will handle the amount of video we are talking about. Tell us if we are crazy about the proposal we are suggesting to our sponsor below:
- 30FPS 10 bit Black & White video stream (30 cameras @ 30FPS) each sensor about 1 MPixel
- Total aggregate raw bandwidth works out to abut 2.4 GB/s
- Matrox tells us to use this PCIe card http://www.matrox.com/video/en/products/developer/hardware/dsx_le4_fh (8 SDI inputs each)
- We would then use GPUdirect to pull all the video into a Tesla K40 12 GB
- Processing would be image cleanup, registration, and some simple object recognition
- We only need about 6 frames at a time, the computation needs to be real-time tracking of objects in the stream, so we have enough onboard RAM
We have a couple of concerns about this (if we cannot use the Telsa K40, we can do a FPGA version)
- Is the Matrox card the best for this?
- Can we believe Matrox that if these 4 cards on the same Root Complex we can read full Raw data from all 30 cameras?
- We really want non 720p or 1080p images. This is a black and white problem, so 12 bits per pixel and 1.3 MP would be nice
- Does this lock down all the CUDA memory, or can we do computation while the raw video is being double buffered into the CUDA memory