How to use GPU Direct RDMA with infiniband ConnectX-4?

I am having trouble setting up GPU Direct on the local machines. Here’s the the local software and hardware:

  • GPU Tesla P100-SXM2
  • Adaptor(MLNX)
    5e:00.0 Infiniband controller: Mellanox Technologies MT27700 Family [ConnectX-4]
    5e:00.1 Infiniband controller: Mellanox Technologies MT27700 Family [ConnectX-4]
  • Cuda compilation tools, release 10.1, V10.1.243
  • Ubuntu 20.04.3 LTS (GNU/Linux 5.4.0-167-generic x86_64)

I tested the RDMA connection by using ibping, and it works fine.

--- anton-j0.(none) (Lid 2) ibping statistics ---
10000 packets transmitted, 10000 received, 0% packet loss, time 1030 ms
rtt min/avg/max = 0.005/0.103/900.020 ms

However, when I was trying to get GPU Direct RDMA to run, nv_peer_mem wouldn’t install. And as the github demo indicated, it requires ConnectX 5+ to work.

I tried to find other ways that is compatible with ConnectX 4 but hasn’t found anything useful yet. I checked the forum and someone got ConnectX 3 pro to work on GPU Direct RDMA. Could someone give me some guidelines to get GPU Direct RDMA working on ConnectX 4?

Hello and thank you for writing us.
This issue can be happening due to some issues.
GPU Direct RDMA is supported with any NVIDIA ConnectX-4 (or later) InfiniBand adapter card. This means that your ConnectX-4 adapters should be compatible with GPU Direct RDMA.
I would like to advice on opening a case in our Enterprise service Portal
This will allow us to dig dipper in to the Issue and help.

Thanks and have a great day!
Ilan.