Questions about GPUDirect RDMA for NVIDIA RTX PRO 6000 Blackwell Max-Q over Infiniband on Ubuntu LTS 22.04

trey.stansfield · October 3, 2025, 4:12pm

We are looking to use RDMA over Infiniband for high-speed, low-latency data transfer between separate computer nodes for real-time analytics during manufacturing. We will be using RDMA zero copy mostly just for transferring data from the sensors to analysis nodes and then back to the manufacturing devices. The general analysis runtime will be in C# with CUDA functions being loaded in via C++ shared libraries. We will be using this Nvidia Infiniband tech: on Ubuntu 22.04 LTS:

NVIDIA RTX PRO 6000 Blackwell Max-Q Workstation Edition - 96GB GDDR7
NVIDIA MCX75310AAS-NEAT ConnectX-7 adapter card
MMA4Z00-NS400 Compatible 400GBASE-SR4 OSFP Flat Top PAM4 850nm 50m DOM MPO-12/APC MMF InfiniBand NDR Optical Transceiver Module
MFP7E10-N030 Compatible 30m (98ft) MTP®-12 APC (Female) to MTP®-12 APC (Female), 8 Fibers, Multimode, Magenta
MQM9700-NS2F - Managed - NVIDIA Quantum-2 WM9700 400G Infiniband Switch

When looking through the RDMA documentation, there were multiple different libraries, so I was curious what would be the most up-to-date and current way to easily handle RDMA programming with these questions:

Does the RTX 6000 and Ubuntu 24 LTS support DMA-BUF for RDMA compared to Peerman?
What modules and packages are needed to be installed for DMA-BUF and RDMA with NCCL on Ubuntu?
How much latency would be added by having a single library which locks, allocates, transfers, and frees memory as a single unit? This library would be called separately from each transfer.
Can GDRCopy be used as a drop-in replacement for cudamalloc in existing CUDA C++ code to reduce latency in moving data from GPU to CPU and vice versa?
What is the latency on using GDR copy to move data from GPU to CPU, do some processing, and sending it back to the GPU for further processing and transfer via NCCL RDMA?
Do these modules work with a real-time Linux kernel?
Will we continue to get NVIDIA Driver updates even if we stay on extended long-term Ubuntu support?
If we have two GPUs (these are both the exact same model listed above) in the analysis system, would it be better to reserve one GPU exclusively to reduce latency?

Topic		Replies	Views
GPUDirect RDMA support with CUDA 5 CUDA Programming and Performance	19	9237	May 28, 2013
gpudirect rdma samples RDMA Software For GPU	1	844	April 8, 2014
GPU Communication Protocol CUDA Programming and Performance	16	6368	May 17, 2010
Exploring GPUDirect on a Local Area network Teaching and Curriculum Support	1	1180	September 22, 2013
GPUDirect RDMA performance CUDA Programming and Performance	2	2192	March 26, 2013
Is there a reference implementation for direct NIC-to-GPU data transfer? RDMA Software For GPU	1	819	September 24, 2015
Benchmarking GPUDirect RDMA on Modern Server Platforms Technical Blog	40	2983	April 11, 2019
cuda with RDMA CUDA Programming and Performance	7	1329	March 14, 2019
Can anyone tell me if a PCIe device can copy directly into GPU GPU-Accelerated Libraries	0	686	January 24, 2014
PCI-e Device to Device Transfers CUDA Programming and Performance	4	7504	September 22, 2010

Questions about GPUDirect RDMA for NVIDIA RTX PRO 6000 Blackwell Max-Q over Infiniband on Ubuntu LTS 22.04

Related topics