Hi! I was developing a loadable kernel driver based on jetson-rdma-picoevb, using GPUDirect. The driver was tested with R35.2.1, but I’m migrating to R36.3 (Jetpack 6).
In the GPUDirect documentation, there is a section called changes in Cuda 12.2 that reports the deprecation of the nv_peer_mem, and the new module nivida_peermem.
I searched on my system, and I can see the module on nvidia-oot folder. I understand the nv_peer_mem should be in this directory, and there would be an nvidia_peermem anywhere, but it does not yet exist.
When I tried to build my driver, a fatal error was raised:
linux/nv-p2p.h: No such file or directory
I tried the Using Nvidia Peermem but I can’t understand, there is a info popup:
Note: If the NVIDIA GPU driver is installed before MLNX_OFED, the GPU driver must be uninstalled and installed again to make sure
nvidia-peermem
is compiled with the RDMA APIs that are provided by MLNX_OFED.
It’s not clear for me. I found at some topics: “nvidia-jetpack contains all drivers and essential softwares”. So, I uninstalled using “apt autoremove --purge nvidia-jetpack”, and I installed the MLNX_OFED with success. After this, I installed the nvidia-jetpack again. However, there is not any nvidia-peermem module yet.
But, I installed manually the nvidia-driver-550 and the module nvidia-peermem was installed, obviously I can’t use this, because it’s not compiled to arm.
What are my mistakes? How can I configure the GPUDirect with this jetpack version?? Do I have to decrease my jetpack version???