I am experimenting with using CUDA to speed up image processing tasks in a much larger system. The input image arrives from a camera at a rate of a few hundreds MB a second. Right now, we copy the image from the hardware to the host, than to the GPU. This is a LOT of bandwidth that goes to waste.
I guess that asking a camera link in the card is too much… but what about writing the image from the acquisitor directly through the PCI-Express NOT through the host memory? theoretically PCI-Express is P2P and enables it.