How can I test the Peer to Peer RDMA PCIe bandwidth between a single MLNX_CX5 NIC and a CUDA capable GPU

Both GPU and NIC are in the same PCIe topology:

GPU0 mlx5_0 mlx5_1 CPU Affinity NUMA Affinity


I am running opensm, MLNX_OFED, nv_peer_mem and perftest. But the test can never seem to allocate any memory on the NIC itself.

Note that I am not testing from GPU to GPU over IB, I only have 1 host, 1 NIC, and 1 GPU. I only want to test RDMA capabilities from NIC to GPU.

When I run the perftest tool, ib_write_bw resource fail to be allocated on the NIC:

  • Waiting for client to connect… *

initializing CUDA

Listing all CUDA devices in system:

CUDA device 0: PCIe address is 8E:00

Picking device No. 0

[pid = 39169, dev = 0] device name = [Graphics Device]

creating CUDA Ctx

making it the current CUDA Ctx

cuMemAlloc() of a 16777216 bytes GPU buffer

allocated GPU buffer address at 00007f8136000000 pointer=0x7f8136000000

Couldn’t allocate MR

failed to create mr

Failed to create MR

Couldn’t create IB resources

Any help would be greatly appreciated

Hi Danny,

Please refer to GPUDirect User Manual:

In section #3, you will find examples for benchmark tests using MVAPICH2 / OpenMPI.

Let me know if you have any questions.



Hi Chen,

Thank you for the link. However, the User Manual appears to be a bit older. Do you haven any guidelines for CUDA 10.1/11 or MLNX_OFED5.0-, openmpi-1.10.7?

I run into many issues when following those instructions with the latest software pieces.

Additionally, I believe the benchmarks highlighted in this manual, involves a mulit-host system, but I only have a single CX5 card and want to test GPU Direct to a single GPU on the same PCIe heirarchy.

Currently I have an ethernet cable plugged in loopback on the CX5.

Hi Chen,

Are you able to provide any updated instructions. Additionally it would appear the instructions also need to be modified for testing only a single NIC to a single GPU on the same PCIe topology.