PCIe DMA driver can not be loaded

ggrutzeck · June 27, 2022, 2:27pm

Dear all,

we developed a custom PCIe DMA driver. It is accessing the GPU related memory space for direct transfers.
On the Jetson Xavier AGX the driver works as expected. But it can not be loaded into the kernel on the Jetson Orin AGX.
It produces the following errors in the kernel log:

[  633.909968] my_dma: disagrees about version of symbol nvidia_p2p_dma_unmap_pages
[  633.910199] my_dma: Unknown symbol nvidia_p2p_dma_unmap_pages (err -22)
[  633.910423] my_dma: disagrees about version of symbol nvidia_p2p_get_pages
[  633.910622] my_dma: Unknown symbol nvidia_p2p_get_pages (err -22)
[  633.910814] my_dma: disagrees about version of symbol nvidia_p2p_put_pages
[  633.911028] my_dma: Unknown symbol nvidia_p2p_put_pages (err -22)
[  633.911218] my_dma: disagrees about version of symbol nvidia_p2p_dma_map_pages
[  633.911431] my_dma: Unknown symbol nvidia_p2p_dma_map_pages (err -22)
[  633.911623] my_dma: disagrees about version of symbol nvidia_p2p_free_page_table
[  633.911833] my_dma: Unknown symbol nvidia_p2p_free_page_table (err -22)

The Orin Module runs the newest version provided by the SDK Manager (L4T 34.1.1). Of cause the kernel module is build locally against the used kernel.

What could be the problem on the Jetson Orin with the custom driver?
Or this this mode not yet supported on the Jetson Orin?

Best regards,
Gerrit

kayccc · June 27, 2022, 11:37pm

Is there any change (from stock SW) in the SW configuration between Xavier and Orin? particular w.r.t disabling SMMU Etc??
Also, have you tried the same BSP version + your SW stack on both Xavier and Orin?

ggrutzeck · July 5, 2022, 2:37pm

The problem seems to be related to the SW version.
I installed the latest version (JetPack DP 5.0.1) via the SDK Manager on an Xavier AGX module and tried to load the module.
I got the following errors:

[  230.312480] my_dma: Unknown symbol nvidia_p2p_dma_unmap_pages (err -2)
[  230.312762] my_dma: Unknown symbol nvidia_p2p_get_pages (err -2)
[  230.312940] my_dma: Unknown symbol nvidia_p2p_put_pages (err -2)
[  230.313121] my_dma: Unknown symbol nvidia_p2p_dma_map_pages (err -2)
[  230.313290] my_dma: Unknown symbol nvidia_p2p_free_page_table (err -2)

I also cloned the reference implementation (GitHub - NVIDIA/jetson-rdma-picoevb: Minimal HW-based demo of GPUDirect RDMA on NVIDIA Jetson AGX Xavier running L4T) and got the following errors:

[  894.912977] picoevb_rdma: Unknown symbol nvidia_p2p_dma_unmap_pages (err -2)
[  894.913252] picoevb_rdma: Unknown symbol nvidia_p2p_get_pages (err -2)
[  894.913438] picoevb_rdma: Unknown symbol nvidia_p2p_put_pages (err -2)
[  894.913607] picoevb_rdma: Unknown symbol nvidia_p2p_dma_map_pages (err -2)
[  894.913799] picoevb_rdma: Unknown symbol nvidia_p2p_free_page_table (err -2)

Loading the reference implementation into the kernel without any CUDA related function calls (via the build script ./build-for-any-no-cuda-native.sh) is no problem.

ggrutzeck · July 5, 2022, 2:51pm

On the Jetson Xavier AGX the problem can be solved by loading the kernel module nvidia-p2p, which includes the missing symbols.

But on the Jetson Orin AGX the module nvidia-p2p can not be loaded due to the following error:

[   61.086913] nvidia_p2p: exports duplicate symbol nvidia_p2p_dma_map_pages (owned by nvidia)

marc.andre · August 18, 2022, 2:49pm

I see the same problem. The module nvidia and nvidia-p2p try to register the same symbol.

hcook · August 18, 2022, 7:04pm

I’m seeing the same problem as described in all of these posts, attempting to run GitHub - NVIDIA/jetson-rdma-picoevb: Minimal HW-based demo of GPUDirect RDMA on NVIDIA Jetson AGX Xavier running L4T on Jetson AGX Orin with Jetpack 5.0.2 and SMMU disabled.

vandev · August 19, 2022, 2:15pm

I have the same problem. nvidia kernel module loads the symbols that nvidia-p2p tries to load and nvidia-p2p fails to load with “exports duplicate symbol” error. I did not have this problem in L4T 34.1 – nvidia-p2p could load without any conflict with nvidia kernel module. This seems to be an inssue in L4T 35.1.

NB: I am testing on Orin AGX Development kit and the standard 35.1 kernel is loaded with SDK Manager.

vandev · August 19, 2022, 3:48pm

I looked up the L4T Driver Package Source for R35.1 from here:

https://developer.nvidia.com/embedded/l4t/r35_release_v1.0/sources/public_sources.tbz2

Seem “nvidia” kernel module now builds nv-p2p functionalities and bundles in one single kernel module (nvidia.ko). So there should not be a need to manually insert nvidia-p2p.ko. But the problem is an older version of “nv-p2p.h” header is used. Meaning the header available on the Orin device (toolchain) is not the same header used while building “nvidia” kernel module. See below:

First few lines of “nv-p2p.h” from L4T Driver Package Source (used for building nvidia.ko, presumably):


/*                                                                                                                                                                        
 * SPDX-FileCopyrightText: Copyright (c) 2011-2016 NVIDIA CORPORATION & AFFILIATES. All rights reserved.                                                                  
 * SPDX-License-Identifier: MIT

First few lines of “nv-p2p.h” available on the Orin device (toolchain available to build on device, located under /usr/src/linux-headers-5.10.104-tegra-ubuntu20.04_aarch64/nvidia/include/linux/nv-p2p.h):

/*
 * Copyright (c) 2018-2019, NVIDIA Corporation.  All rights reserved.
 *

This explains why when I build my kernel module (driver for a PCIe device) on Orin, the generated module cannot be loaded and fails with “disagrees about version of symbol nvidia_p2p_dma_unmap_pages”. Same error as the one @ggrutzeck sees. This is because older version of symbols are built and loaded by “nvidia” kernel module, but the runtime toolchain only provides the most recent nv-p2p.h header.

Possible solutions are:

re-build nvidia kernel module and do not include “nv-p2p.c” symbols. I assume this is dead end, cause there must have been a reason the kernel module includes some version of nv-p2p.c
re-build nvidia kernel module with the most recent nv-p2p.h
update the toolchain (e.g. nv-p2p.h) header on the Orin’s device and build my kernel module with the same header as used by nvidia kernel module. Then there shouldn’t be a need to insmod nvidia-p2p kernel module anymore.

@kayccc , thoughts?

kayccc · August 31, 2022, 1:25am

Sorry for the late response.
To resolved this PCIE DMA driver issues you met, please try with JetPack 5.0.2 GA release, then file your own issue separately.
I will have team to follow up.

I’m closing this issue now. Thanks

Topic		Replies	Views
GPUDirect RDMA - Module can not be insert into kernel Jetson AGX Orin pcie , kernel , nvbugs	27	4633	November 2, 2022
RDMA - PCIe module can not be inserted into kernel Jetson AGX Orin pcie	2	1112	February 21, 2023
Jetson Orin Developer Kit - RDMA not working Jetson Nano gpu	7	189	January 2, 2025
GPUDirect RDMA on Jetson Orin (nvidia_p2p_dma_map_pages) Jetson AGX Orin gpu	13	2730	November 16, 2022
GPUDirect RDMA - Module can not be insert into kernel cont'd Jetson AGX Orin gpu	18	988	May 15, 2024
Support GPUDirect RDMA on Jetson AGX Orin development kit Jetson AGX Orin cuda	9	1114	April 26, 2023
AGX Orin, JP5.1.2 and PCIE Endpoint Troubleshooting Jetson AGX Orin pcie , board-design	18	1335	November 19, 2023
Does Jetson AGX Orin Development Kit support the rdma? Jetson AGX Orin gpu	2	233	June 4, 2024
Device Tree Mods upgrading from AGX Xavier to AGX Orin Jetson AGX Orin device-tree	36	758	March 1, 2024
Migrating from Xavier to Orin, ethernet won't come up Jetson AGX Orin board-design , ethernet	14	744	May 10, 2024

PCIe DMA driver can not be loaded

Related topics