during this installation guide (GPUNetIO Programming Guide - NVIDIA Docs) at section 2.3, when running this command in Ubuntu 22.0.4 : sudo modprobe nvidia-peermem , I get the following error: modprobe: FATAL: Module nvidia-peermem not found in directory /lib/modules/5.15.0-89-generic. I have blue field 2, and installed doca using sdkmanager.
Hello @amirluckach,
Thank you for posting your query on our community.
If the NVIDIA GPU driver is installed before MLNX_OFED, the GPU driver must be uninstalled and installed again to make sure nvidia-peermem is compiled with the RDMA APIs that are provided by MLNX_OFED. Please refer to the following link - 1. Overview — GPUDirect RDMA 12.3 documentation
Thanks,
Bhargavi
Hi Bhargavi
thanks, I solved the problem by running sudo apt install nvidia-driver-470, but now when I continue the installation by running “make” , in this sequence:
# Install GDRCopy
sudo
apt install
-y check kmod git clone GitHub - NVIDIA/gdrcopy: A fast GPU memory copy library based on NVIDIA GPUDirect RDMA technology /opt/mellanox/gdrcopy cd
/opt/mellanox/gdrcopy make
I get an error,
copybw.cpp:30:10: fatal error: cuda.h: No such file or directory
30 | include <cuda.h>
please advise
Amir