Is there a reference implementation for direct NIC-to-GPU data transfer?

Many of the material on utilizing GPUDirect RDMA with high-speed NICs on Mellanox and NVIDIA websites seem geared towards distributed computing applications using MPI where a GPU on one machine needs to communicate with another GPU on a different machine.

I am wondering if there is a minimal example or reference implementation that shows how to do a simple transfer of data received on a NIC directly to GPU? Specifically, using a ConnectX-3 10GbE card and a supported GPU with GPUDirect RDMA, what are the specific steps / function calls to achieve such a transfer? I have looked at the document peer_memory_api.txt (shipped with OFED 3.1.1) and the white paper on GPUDirect on NVIDIA’s site, but it does not look like there is sufficient information among these resources to develop a working solution.

Thanks,

I would recommend to check latest ‘perftest’ tool that has some basic support or OpenMPI source code that implemented such kind of support.