Is there any where I can download the sample code for GPUDirect RDMA?

system · February 10, 2016, 3:18pm

Dear all,

I am asking if there is any sample code for GPUDirect RDMA I can download. I am trying to use GPUDirect RDMA beta in the project accessing data on the GPUs of one computer from the GPUs of another computer. However, the User Manual (http://www.mellanox.com/page/products_dyn?product_family=116&mtag=gpudirect http://www.mellanox.com/page/products_dyn?product_family=116&mtag=gpudirect ) provides the bechmark MVAPICH2-GDR which has no source code yet. Do you know where we can download the sample code using verbs + CUDA to implement GPUDirect RDMA, so I can build our own project according to the source code?

Thanks a lot.

Lin

spruitt · February 11, 2016, 3:07am

Hi Lin Chen,

You can read the source code of OSU microbenchmarks with the CUDA support to understand how to implement your code from OSU MVAPICH :: Benchmarks MVAPICH :: Benchmarks

The MVAPICH2-GDR is free, but not open source. If you need to have an MPI that is free and open source, you can use Open MPI as an alternative. Both MVAPICH2-GDR and Open MPI have CUDA aware and can support GPUDirect RDMA.

Thank you,

Sophie.

rdarbha · February 12, 2016, 8:39pm

Dear Sophie,

Thanks for your reply. Attached below are some question after reading and checking the materials you sent to me. I also sent those question to Mr. Scot Schultz, the director of HPC at Mellanox.

To build a simple template accessing data on the GPUs of one node from the GPUs of another GPUs, we want to run some benchmark code or testing code on the system to ensure GPUDirect RMDA works on our system, then to create the template code for the future use.

Benchmark:

MPI level

A. openMPI

openMPIs support GPUDirect RDMA since v1.7. I downloaded the openmpi-v1.10, however, there is no GPUDirect RDMA sample code in the examples folder. Do you know there is any place we can find some more sample code using GPUDirect RMDA + openMPI? I only found one sample code at code-samples/posts/cuda-aware-mpi-example/src at master · NVIDIA-developer-blog/code-samples · GitHub code-samples/posts/cuda-aware-mpi-example/src at master · NVIDIA-developer-blog/code-samples · GitHub And do you know how we can tell there is no CPU memory involved? The current MPI_send() and MPI_recv() are likely the blackbox which encapsulates the GPUDirect RDMA.

B. MVAPICH2-GDR

MVAPICH2 is using the gdrcopy library which you showed in your reply (GitHub - NVIDIA/gdrcopy: A fast GPU memory copy library based on NVIDIA GPUDirect RDMA technology GitHub - NVIDIA/gdrcopy: A fast GPU memory copy library based on NVIDIA GPUDirect RDMA technology ), in which they are using API like cuPointerSetAttribute(). However, in the NVIDIA documentation, there is not too much details or sample code to demo it. Do you know if there is any cuda sample code using these APIs?

CUDA+IB verbs level

I am not able to open the link you gave me (git://git.openfabrics.org/~grockah/perftest.git http://git.openfabrics.org/~grockah/perftest.git ). Could you please show me another link if possible?

Ib tools

There is an article benchmarking GPUDirect RDMA using ibv_*ud_*pingpong and ibv_rdma_bw from libibverbs-1.1 and perftest-1.3 (https://devblogs.nvidia.com/parallelforall/benchmarking-gpudirect-rdma-on-modern-server-platforms/#platforms https://devblogs.nvidia.com/parallelforall/benchmarking-gpudirect-rdma-on-modern-server-platforms/#platforms ). The author, Davide Rossetti, is a NVIDIA developer. However, he did not mention how to test it. By my understanding, ibv_*ud_*pingpong need server ip, I am not sure how to use it to test the RDMA connection between GPUs or Host-GPU. And the newest perftest does not provide ibv_rdma_bw any more. I did find the author’s e-mail. Do you have any experience benchmarking GPUDirect RDMA using those ib tools? If so, could you please show us how to implement?

Appreciate your help.

spruitt · February 11, 2016, 6:54pm

Hi Lin Chen,

Also, other places that might interest you :

GDRCopy code is a good example on how to use the GPUDirect RDMA API: GitHub - NVIDIA/gdrcopy: A fast GPU memory copy library based on NVIDIA GPUDirect RDMA technology GitHub - NVIDIA/gdrcopy: A fast GPU memory copy library based on NVIDIA GPUDirect RDMA technology

If you are looking for a CUDA + IB verbs level example, the ib_send_bw and ib_write_bw tests in perftest could serve as an example. A copy of the perftest can be found here: git://git.openfabrics.org/~grockah/perftest.git http://git.openfabrics.org/~grockah/perftest.git

Thank you,

Sophie.