Hey guys,
I’m looking into moving data from a third party PCI-e card (e.g. a framegrabber, an InfiniBand device or an FPGA) to a CUDA GPU with the lowest possible latency (high throughput would be nice as well). I’d like to know the current state of this issue from NVIDIA’s perspective.
The reason why I’m asking for this is that our chair runs several projects with industry users, mostly dealing with control systems. In these systems external sensors (e.g. a high speed camera) deliver a high data rate stream, which in turn has to be processed (e.g. object detection) and the results are then used to manipulate the system (e.g. by moving a robot). Currently users either employ custom ASICs (complex, costly), FPGAs (complex, not well suited for floating point calculations), or x86 compatible CPUs (often too slow). GPGPUs represent an appealing alternative, but latency in these applications is measured in usec, not msec. Therefore an additional copy to system memory is undesirable.
In the forum I found several posts (one, two, three, four and recently five), which indicates to me that there is a general interest in this topic. GPUDirect seems to be a step into the right directions, alas I couldn’t find any hints on when it’ll be available to folks outside NVIDIA or Mellanox.
So, my question is: will we see new APIs added to CUDA dealing with this issue, and if yes: when? It doesn’t have to be nice and shiny, we’re computer architects, after all. We’d be happy to beta test related software or even take a look at alpha versions, if they suit our needs.
Thanks
-Andreas