GPU to GPU direct data transfer with connectX and RDMA

Adelgado2719 · May 12, 2025, 2:45pm

Hello, we are trying to connect two gpus located on two servers via RDMA and infinibands. The GPUs are Nvidia RTX 6000 Ada and the infinbands are NVIDIA ConnectX-6.

Our server has the configuration of the image, where we have the GPU connected in slot 2 (although it occupies slot 1 and 2) and the connectX in slot 3.

By looking at the connection between the infiniband and the GPU (terminal command nvidia-smi topo -m) you can see that the connection is NODE.

Terminal output:

nvidia-smi topo -m
GPU0 NIC0 CPU Affinity NUMA Affinity GPU NUMA ID
GPU0 X NODE 0,2,4,6,8,10 0 N/A
NIC0 NODE X

Legend:

X = Self
SYS = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
PHB = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
PXB = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
PIX = Connection traversing at most a single PCIe bridge
NV# = Connection traversing a bonded set of # NVLinks

NIC Legend:
NIC0: mlx5_0

According to the web page: NVIDIA Configuration | Juniper Networks, this causes a bad performance, but due to the layout of our server, it is not possible to move the gpu nor the connect X.

We have programmed two scripts in Python, one for the sending server and one for the receiving server.

The code for the server that sends data is the following
Sender code.txt (3.2 KB)
and the receiver code:
receiver code.txt (2.6 KB)

The receiver’s code follows the same structure, but we can’t see the changes in the message sent on the receiver’s side. Is it possible to make the connection between them despite having a NODE connection type?

On the other hand, we are not sure if we have the nvidia-peermem kernel enabled correctly and if this may be affecting the transfer.

Thank you very much

Topic		Replies	Views
How to use GPU Direct RDMA with infiniband ConnectX-4? InfiniBand/VPI Adapter Cards	1	924	February 14, 2024
FPGA cannot communicate with A100 through XDMA Using RDMA RDMA Software For GPU rdmaroce-solutions	5	318	June 12, 2024
If without Infiniband, how can I use GPUDirect RDMA to transfer data from NIC to GPU device bypass CPU and host memory? RDMA Software For GPU kernel	1	522	March 25, 2024
P2P access Ada GPUs with PCIe switch CUDA Programming and Performance	9	100	April 28, 2025
Is there a reference implementation for direct NIC-to-GPU data transfer? RDMA Software For GPU	1	771	September 24, 2015
Benchmarking GPUDirect RDMA on Modern Server Platforms Technical Blog	40	2750	April 11, 2019
GPUDirect question - cudaDeviceCanAccessPeer information CUDA Programming and Performance	9	4315	January 2, 2020
GPUDirect RDMA support with CUDA 5 CUDA Programming and Performance	19	9173	May 28, 2013
RDMA GPU Direct Slow CUDA Programming and Performance	10	2427	February 13, 2019
PCIe peer-to-peer between NVIDIA GPU and generic hardware? RDMA Software For GPU kernel	3	710	January 30, 2024

GPU to GPU direct data transfer with connectX and RDMA

Terminal output:

Related topics