PCI-e Device to Device Transfers

Hey guys,

I’m looking into moving data from a third party PCI-e card (e.g. a framegrabber, an InfiniBand device or an FPGA) to a CUDA GPU with the lowest possible latency (high throughput would be nice as well). I’d like to know the current state of this issue from NVIDIA’s perspective.

The reason why I’m asking for this is that our chair runs several projects with industry users, mostly dealing with control systems. In these systems external sensors (e.g. a high speed camera) deliver a high data rate stream, which in turn has to be processed (e.g. object detection) and the results are then used to manipulate the system (e.g. by moving a robot). Currently users either employ custom ASICs (complex, costly), FPGAs (complex, not well suited for floating point calculations), or x86 compatible CPUs (often too slow). GPGPUs represent an appealing alternative, but latency in these applications is measured in usec, not msec. Therefore an additional copy to system memory is undesirable.

In the forum I found several posts (one, two, three, four and recently five), which indicates to me that there is a general interest in this topic. GPUDirect seems to be a step into the right directions, alas I couldn’t find any hints on when it’ll be available to folks outside NVIDIA or Mellanox.

So, my question is: will we see new APIs added to CUDA dealing with this issue, and if yes: when? It doesn’t have to be nice and shiny, we’re computer architects, after all. We’d be happy to beta test related software or even take a look at alpha versions, if they suit our needs.

Thanks
-Andreas

Up

Up

Some Tesla boards offer Infiniband connectivity (maybe not on all OS’es though), which may be suitable for mentioned applications.

On the other edge of the performance spectrum there’s direct read/write access of the GPU to the host’s main memory (on nVidia Ion etc…) via PCI-express, which may permit to write some low latency algorithms.

this blog postings lists some supported platforms: http://blog.cudachess.org/2009/08/pinned-mapped-memory/

Christian

Some Tesla boards offer Infiniband connectivity (maybe not on all OS’es though), which may be suitable for mentioned applications.

On the other edge of the performance spectrum there’s direct read/write access of the GPU to the host’s main memory (on nVidia Ion etc…) via PCI-express, which may permit to write some low latency algorithms.

this blog postings lists some supported platforms: http://blog.cudachess.org/2009/08/pinned-mapped-memory/

Christian