Description:
We’re encountering an issue when attempting to load the nvidia-peermem
kernel module on a system configured for GPUDirect RDMA. The error appears during modprobe
, returning Invalid argument
.
System Configuration:
GPU: NVIDIA L40S
CUDA Version: 12.9
NVIDIA Driver: 575.57.08
OS: Ubuntu 24.04 LTS
Kernel: 6.8.0-60-generic
NIC: Mellanox ConnectX-6
nvidia-peermem: Using bundled version from driver 575
IOMMU: Disabled (intel_iommu=off iommu=off)
Steps to Reproduce:
- Boot into Ubuntu 24.04 with the NVIDIA driver properly installed.
- Run:
sudo modprobe nvidia_peermem --verbose
insmod /lib/modules/6.8.0-60-generic/updates/dkms/nvidia-peermem.ko.zst
modprobe: ERROR: could not insert 'nvidia_peermem': Invalid argument
There is no new output in dmesg.
We would greatly appreciate any advice or clarification from NVIDIA or other users who have successfully used GPUDirect RDMA on similar setups. And got it working.