RDMA Questions

irakatz51 · December 18, 2023, 12:12pm

Hello NVIDIA developers,

I want to test RDMA on my GPUs. Before I do some experiments, I want to ask 2 questions.

I have NVIDIA A16 GPU and a NIC, both of which connects to PCIe. Can I perform RDMA between them?
I currently install a open-source kernel module to control my GPU. But based on the manual (1. Overview — GPUDirect RDMA 12.3 documentation), I must modify the module to perform RDMA. Do we have some available examples?
If I successfully modify the kernel module and my GPU is available for RDMA, can you provide some CUDA application examples? I can use them to test RDMA.

Sincerely,
irakatz

xiaofengl · December 19, 2023, 1:55am

1.Depend on your system, GPU and HCA need on same PCIE root, disable IOMMU and PCIE ACSCtl.

2.I don’t think need modify, but you need use CUDA11.4 above better. There is kernel driver for GDR nv_peer_mem.ko on that.

3.GDR is simple, you just need use cuMemAlloc alloc GPU memory then use ibv_reg_mr register rdma mr.

There is CUDA manual for GDR,

irakatz51 · December 19, 2023, 2:35pm

Thanks for your reply.

Now I re-install an open-source kernel module (GitHub - NVIDIA/open-gpu-kernel-modules: NVIDIA Linux open GPU kernel module source, version 525.147.05, and my CUDA is 12.0). During the installation, I find a compiled “nvidia-peermem.ko” here, but it is not installed.

However, when I manually run sudo insmod nvidia-peermem.ko, it says insmod: ERROR: could not insert module nvidia-peermem.ko: Invalid parameters

How to solve this problem?

xiaofengl · December 20, 2023, 1:33am

You can try modprobe.

And there is systemd service should be there, nvidia-peermem etc, you can check by “systemctl list-units --type=service”

And, if you installed MOFED, there is another same module, nv_peer_mem.ko, same with nvidia-peermem, one use one is OK.

system · January 3, 2024, 1:33am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
PCIe peer-to-peer between NVIDIA GPU and generic hardware? RDMA Software For GPU kernel	3	723	January 30, 2024
Is there any documentation about nv_peer_mem and nvidia_peermem? CUDA Programming and Performance	0	1315	August 28, 2021
How to use nvidia-peermem? Jetson AGX Orin cuda	8	656	March 10, 2025
GPUDirect RDMA at the ibverbs level. Software And Drivers iterations , bytes	4	1597	November 30, 2020
How can I test the Peer to Peer RDMA PCIe bandwidth between a single MLNX_CX5 NIC and a CUDA capable GPU InfiniBand/VPI Adapter Cards software-and-drivers , adapters-and-cables , opensm	3	1406	July 14, 2020
Nvidia-peermem: Invalid argument on modprobe with CUDA 12.9 / Driver 575.57.08 (Ubuntu 24.04, IOMMU disabled) CUDA Setup and Installation	0	18	June 16, 2025
GPU Direct RDMA Help CUDA Programming and Performance	4	1444	November 22, 2020
Rivermax & GPUDirect Network Management Products gpu , inception , rivermax	5	2472	October 6, 2022
If without Infiniband, how can I use GPUDirect RDMA to transfer data from NIC to GPU device bypass CPU and host memory? RDMA Software For GPU kernel	1	540	March 25, 2024
First step RDMA Software For GPU	3	193	July 10, 2024

RDMA Questions

Related topics