GPUDirect RDMA - Module can not be insert into kernel

ggrutzeck · September 14, 2022, 4:08pm

This is a followup of PCIe DMA driver can not be loaded
I installed a fresh install on the Jetson Orin with Jetpack 5.0.2.
The file /etc/nv_tegra_release has the following content: # R35 (release), REVISION: 1.0, GCID: 31346300, BOARD: t186ref, EABI: aarch64, DATE: Thu Aug 25 18:41:45 UTC 2022.
I build my custom kernel module which uses the direct DMA transfers from the PCIe card to the memory space of the GPU (GPUDirect RDMA).
But it is not possible to insert that module, as the following errors are reported by the kernel:

[  473.515741] my_dma: module verification failed: signature and/or required key missing - tainting kernel
[  473.525670] my_dma: disagrees about version of symbol nvidia_p2p_dma_unmap_pages
[  473.533324] my_dma: Unknown symbol nvidia_p2p_dma_unmap_pages (err -22)
[  473.540224] my_dma: disagrees about version of symbol nvidia_p2p_get_pages
[  473.547323] my_dma: Unknown symbol nvidia_p2p_get_pages (err -22)
[  473.553652] my_dma: disagrees about version of symbol nvidia_p2p_put_pages
[  473.560750] my_dma: Unknown symbol nvidia_p2p_put_pages (err -22)
[  473.567050] my_dma: disagrees about version of symbol nvidia_p2p_dma_map_pages
[  473.574510] my_dma: Unknown symbol nvidia_p2p_dma_map_pages (err -22)
[  473.581172] my_dma: disagrees about version of symbol nvidia_p2p_free_page_table
[  473.588813] my_dma: Unknown symbol nvidia_p2p_free_page_table (err -22)

The very same errors are produced, when I try to insert the example kernel module from GitHub - NVIDIA/jetson-rdma-picoevb: Minimal HW-based demo of GPUDirect RDMA on NVIDIA Jetson AGX Xavier running L4T

As @vandev noticed in the other topic that the header files of the toolchain on the device do not match the header files of public_sources.tbz2 for the JetPack DP 5.0.1. This is no longer the case for JetPack 5.0.2, but still it is not possible to load the kernel modules.

@kayccc as you mentioned in the other topic we should open a new topic, when the issue is still present in the JetPack 5.0.2 and it is.

DigPat · September 16, 2022, 7:45am

I have the same issue when trying to adopt gdrcopy (GitHub - NVIDIA/gdrcopy: A fast GPU memory copy library based on NVIDIA GPUDirect RDMA technology).

$ sudo insmod gdrdrv.ko
insmod: ERROR: could not insert module gdrdrv.ko: Invalid parameters

...
[ 3963.739146] gdrdrv: disagrees about version of symbol nvidia_p2p_get_pages
[ 3963.739384] gdrdrv: Unknown symbol nvidia_p2p_get_pages (err -22)
[ 3963.739593] gdrdrv: disagrees about version of symbol nvidia_p2p_put_pages
[ 3963.739808] gdrdrv: Unknown symbol nvidia_p2p_put_pages (err -22)
[ 3963.740025] gdrdrv: disagrees about version of symbol nvidia_p2p_free_page_table
[ 3963.740254] gdrdrv: Unknown symbol nvidia_p2p_free_page_table (err -22)

DigPat · September 21, 2022, 9:30am

Now I have also tried downgrading to R34.1.1 and JetPack 5.0.1.
It has exactly the same problem.

WayneWWW · September 21, 2022, 9:37am

Hi,

How did you rebuild this ko file? Do you use the same toolchain as the original kernel?

DigPat · September 21, 2022, 11:41am

I used the toolchain on the jetson.
I have successfully built other kernel modules this way. But they did not depend on other modules…
So in this case I must cross compile with Bootlin Toolchain gcc 9.3 since this module it depends on a builtin module?

DigPat · September 22, 2022, 9:11am

I now did the following:
I flashed the 35.1 release and installed JetPack.

I compiled the kernel on my PC as described here:
https://docs.nvidia.com/jetson/archives/r35.1/DeveloperGuide/text/SD/Kernel/KernelCustomization.html
with the Driver Package (BSP) Sources and Bootlin Toolchain gcc 9.3 from:
https://developer.nvidia.com/embedded/jetson-linux

I successfully cross compiled my kernel module linked to the built kernel with:

export CROSS_COMPILE_AARCH64_PATH=~/jetson/l4t-gcc/
export CROSS_COMPILE_AARCH64=~/jetson/l4t-gcc/bin/aarch64-buildroot-linux-gnu-
export TEGRA_KERNEL_DIR=~/jetson/kernel/35.1/Linux_for_Tegra/source/public/kernel/
export CROSS_COMPILE=~/jetson/l4t-gcc/bin/aarch64-buildroot-linux-gnu-

make ARCH=arm64 -C $TEGRA_KERNEL_DIR../kernel_out M=$PWD

But it still get the “disagrees about version of symbol” on the Jetson AGX Orin :-(

What am I missing?

WayneWWW · September 22, 2022, 9:45am

Hi,

I think this driver is not validated on jetpack5 before. And its dependency has problem too.

For example, nvidia_p2p_get_pages seems not really exist.

DigPat · September 22, 2022, 12:17pm

Some more observations. One problem seems to be that nvidia-p2p is not loaded. When trying to load this module manually it fails with “exports duplicate symbol” owned module nividia. I did the an experiment to unload the nvidia module. It is used by the graphical system so it must be disabled first.

sudo systemctl set-default multi-user.target
sudo reboot
*LOGIN AFTER REBOOT*
sudo modprobe -r nvidia
sudo modprobe nvidia-p2p
sudo insmod gdrdrv.ko

And hey i can load my module! Even the module built locally on the Jetson can be loaded. I have no means to actually verify p2p functionality at this stage.

WARNING! Doing this seems to kill the DisplayPort output and you can only access it with ssh even after reboot!
You can restore the system DisplayPort output with:

sudo systemctl set-default graphical.target
sudo reboot

AastaLLL · September 22, 2022, 11:52pm

Hi,

We are checking this issue with our internal team.
Will share more information with you later.

Thanks

dipenp · September 23, 2022, 12:25am

For jetson-rdma-picoevb, how are you compiling kernel module it? I mean as iGPU or dGPU.

AastaLLL · October 3, 2022, 7:58am

Hi,

We can find the nvidia_p2p_get_pages symbol in the kernel_src.tbz2 or r35.1.
Could you please check it again?

$ grep -ir nvidia_p2p_get_pages
kernel/nvidia/drivers/nv-p2p/nvidia-p2p.c:int nvidia_p2p_get_pages(u64 vaddr, u64 size,
kernel/nvidia/drivers/nv-p2p/nvidia-p2p.c:EXPORT_SYMBOL(nvidia_p2p_get_pages);
kernel/nvidia/include/linux/nv-p2p.h:int nvidia_p2p_get_pages(u64 vaddr, u64 size,
kernel/nvidia/include/linux/nv-p2p.h: *   Map the pages retrieved using nvidia_p2p_get_pages and

Thanks.

AastaLLL · October 3, 2022, 8:47am

Hi,

We just got some feedback from our internal team that t nv-p2p.ko and nvidia.ko cannot be used together.
Do you want to use them at the same time?

Thanks.

DigPat · October 4, 2022, 6:56am

I want to do p2p and have display output simultaneously. I would assume I need both for that. Am I correct?

WayneWWW · October 4, 2022, 7:03am

nvidia.ko, nvidia-modeset and nvgpu.ko are responsible for the display to work fine.

DigPat · October 4, 2022, 12:52pm

Is nvidia.ko required to run programs using Cuda ?

ggrutzeck · October 4, 2022, 1:13pm

The jetson-rdma-picoevb is build with the script for the iGPU of the Jetson on the Jetson itself.

AastaLLL · October 5, 2022, 2:16am

Hi, both

The same symbols are defined in both nvidia.ko and nv-p2p.ko.
So they cannot be added to the kernel at the same time.

nvidia.ko was only loaded for the dGPU use case.
That’s why we don’t expect it will be loaded when designing the nv_p2p.ko.

We are double-checking if nvidia.ko is required for Orin’s functionality.
Could you also test if it works by only adding the nv-p2p.ko into the kernel?

Thanks.

AastaLLL · October 5, 2022, 2:56am

Hi,

The nvidia.ko is used for display from Orin.
It is also possible to affect some functionality that requires the graphic driver. (ex. argus)

Thanks.

DigPat · October 5, 2022, 7:09am

I have already verified that loading only nv-p2p.ko works. See (GPUDirect RDMA - Module can not be insert into kernel - #10 by DigPat)

I’m not sure I understand… but lets start with my goal.
I have a card with an onboard FPGA connected to the PCIe slot on the ORIN. I want to do peer 2 peer data transactions using functions defined in nv-p2p.h to the iGPU memory using Cuda.

Is this possible?
What module should I load?
Can I also have display output at the same time?

vandev · October 6, 2022, 9:38pm

Thanks for the suggestion. At least I can load my PCIe device driver kernel module now.

Looking forward to hear from nvidia for a fix.

Topic		Replies	Views
PCIe DMA driver can not be loaded Jetson AGX Orin pcie	9	1645	August 31, 2022
Jetson Orin Developer Kit - RDMA not working Jetson Nano gpu	7	189	January 2, 2025
GPUDirect RDMA - Module can not be insert into kernel cont'd Jetson AGX Orin gpu	18	988	May 15, 2024
GPUDirect RDMA on Jetson Orin (nvidia_p2p_dma_map_pages) Jetson AGX Orin gpu	13	2730	November 16, 2022
RDMA - PCIe module can not be inserted into kernel Jetson AGX Orin pcie	2	1112	February 21, 2023
How to use nvidia-peermem? Jetson AGX Orin cuda	8	407	March 10, 2025
Support GPUDirect RDMA on Jetson AGX Orin development kit Jetson AGX Orin cuda	9	1114	April 26, 2023
JetPack 6.3 containerd and kubernetes Jetson AGX Orin nvbugs , containers	12	850	August 22, 2024
Failed to restart display after loading self-compiled Image and dtb Jetson AGX Orin nvbugs , device-tree	85	2932	July 13, 2022
Unable to locate package nvidia-jetpack on Orin devkit Jetson AGX Orin reflash , nvbugs	31	13061	May 27, 2022

GPUDirect RDMA - Module can not be insert into kernel

Related topics