I am asking if there is any sample code for GPUDirect RDMA I can download. I am trying to use GPUDirect RDMA beta in the project accessing data on the GPUs of one computer from the GPUs of another computer. However, the User Manual (http://www.mellanox.com/page/products_dyn?product_family=116&mtag=gpudirect http://www.mellanox.com/page/products_dyn?product_family=116&mtag=gpudirect ) provides the bechmark MVAPICH2-GDR which has no source code yet. Do you know where we can download the sample code using verbs + CUDA to implement GPUDirect RDMA, so I can build our own project according to the source code?
Thanks a lot.
Hi Lin Chen,
You can read the source code of OSU microbenchmarks with the CUDA support to understand how to implement your code from OSU http://mvapich.cse.ohio-state.edu/benchmarks/ http://mvapich.cse.ohio-state.edu/benchmarks/
The MVAPICH2-GDR is free, but not open source. If you need to have an MPI that is free and open source, you can use Open MPI as an alternative. Both MVAPICH2-GDR and Open MPI have CUDA aware and can support GPUDirect RDMA.
Thanks for your reply. Attached below are some question after reading and checking the materials you sent to me. I also sent those question to Mr. Scot Schultz, the director of HPC at Mellanox.
To build a simple template accessing data on the GPUs of one node from the GPUs of another GPUs, we want to run some benchmark code or testing code on the system to ensure GPUDirect RMDA works on our system, then to create the template code for the future use.
- MPI level
openMPIs support GPUDirect RDMA since v1.7. I downloaded the openmpi-v1.10, however, there is no GPUDirect RDMA sample code in the examples folder. Do you know there is any place we can find some more sample code using GPUDirect RMDA + openMPI? I only found one sample code at https://github.com/parallel-forall/code-samples/tree/master/posts/cuda-aware-mpi-example/src https://github.com/parallel-forall/code-samples/tree/master/posts/cuda-aware-mpi-example/src And do you know how we can tell there is no CPU memory involved? The current MPI_send() and MPI_recv() are likely the blackbox which encapsulates the GPUDirect RDMA.
MVAPICH2 is using the gdrcopy library which you showed in your reply (https://github.com/NVIDIA/gdrcopy https://github.com/NVIDIA/gdrcopy ), in which they are using API like cuPointerSetAttribute(). However, in the NVIDIA documentation, there is not too much details or sample code to demo it. Do you know if there is any cuda sample code using these APIs?
- CUDA+IB verbs level
I am not able to open the link you gave me (git://git.openfabrics.org/~grockah/perftest.git http://git.openfabrics.org/~grockah/perftest.git ). Could you please show me another link if possible?
- Ib tools
There is an article benchmarking GPUDirect RDMA using ibv_*ud_*pingpong and ibv_rdma_bw from libibverbs-1.1 and perftest-1.3 (https://devblogs.nvidia.com/parallelforall/benchmarking-gpudirect-rdma-on-modern-server-platforms/#platforms https://devblogs.nvidia.com/parallelforall/benchmarking-gpudirect-rdma-on-modern-server-platforms/#platforms ). The author, Davide Rossetti, is a NVIDIA developer. However, he did not mention how to test it. By my understanding, ibv_*ud_*pingpong need server ip, I am not sure how to use it to test the RDMA connection between GPUs or Host-GPU. And the newest perftest does not provide ibv_rdma_bw any more. I did find the author’s e-mail. Do you have any experience benchmarking GPUDirect RDMA using those ib tools? If so, could you please show us how to implement?
Appreciate your help.
Hi Lin Chen,
Also, other places that might interest you :
GDRCopy code is a good example on how to use the GPUDirect RDMA API: https://github.com/NVIDIA/gdrcopy https://github.com/NVIDIA/gdrcopy
If you are looking for a CUDA + IB verbs level example, the ib_send_bw and ib_write_bw tests in perftest could serve as an example. A copy of the perftest can be found here: git://git.openfabrics.org/~grockah/perftest.git http://git.openfabrics.org/~grockah/perftest.git