PCIe DMA driver can not be loaded

Dear all,

we developed a custom PCIe DMA driver. It is accessing the GPU related memory space for direct transfers.
On the Jetson Xavier AGX the driver works as expected. But it can not be loaded into the kernel on the Jetson Orin AGX.
It produces the following errors in the kernel log:

[  633.909968] my_dma: disagrees about version of symbol nvidia_p2p_dma_unmap_pages
[  633.910199] my_dma: Unknown symbol nvidia_p2p_dma_unmap_pages (err -22)
[  633.910423] my_dma: disagrees about version of symbol nvidia_p2p_get_pages
[  633.910622] my_dma: Unknown symbol nvidia_p2p_get_pages (err -22)
[  633.910814] my_dma: disagrees about version of symbol nvidia_p2p_put_pages
[  633.911028] my_dma: Unknown symbol nvidia_p2p_put_pages (err -22)
[  633.911218] my_dma: disagrees about version of symbol nvidia_p2p_dma_map_pages
[  633.911431] my_dma: Unknown symbol nvidia_p2p_dma_map_pages (err -22)
[  633.911623] my_dma: disagrees about version of symbol nvidia_p2p_free_page_table
[  633.911833] my_dma: Unknown symbol nvidia_p2p_free_page_table (err -22)

The Orin Module runs the newest version provided by the SDK Manager (L4T 34.1.1). Of cause the kernel module is build locally against the used kernel.

What could be the problem on the Jetson Orin with the custom driver?
Or this this mode not yet supported on the Jetson Orin?

Best regards,
Gerrit

Is there any change (from stock SW) in the SW configuration between Xavier and Orin? particular w.r.t disabling SMMU Etc??
Also, have you tried the same BSP version + your SW stack on both Xavier and Orin?

The problem seems to be related to the SW version.
I installed the latest version (JetPack DP 5.0.1) via the SDK Manager on an Xavier AGX module and tried to load the module.
I got the following errors:

[  230.312480] my_dma: Unknown symbol nvidia_p2p_dma_unmap_pages (err -2)
[  230.312762] my_dma: Unknown symbol nvidia_p2p_get_pages (err -2)
[  230.312940] my_dma: Unknown symbol nvidia_p2p_put_pages (err -2)
[  230.313121] my_dma: Unknown symbol nvidia_p2p_dma_map_pages (err -2)
[  230.313290] my_dma: Unknown symbol nvidia_p2p_free_page_table (err -2)

I also cloned the reference implementation (GitHub - NVIDIA/jetson-rdma-picoevb: Minimal HW-based demo of GPUDirect RDMA on NVIDIA Jetson AGX Xavier running L4T) and got the following errors:

[  894.912977] picoevb_rdma: Unknown symbol nvidia_p2p_dma_unmap_pages (err -2)
[  894.913252] picoevb_rdma: Unknown symbol nvidia_p2p_get_pages (err -2)
[  894.913438] picoevb_rdma: Unknown symbol nvidia_p2p_put_pages (err -2)
[  894.913607] picoevb_rdma: Unknown symbol nvidia_p2p_dma_map_pages (err -2)
[  894.913799] picoevb_rdma: Unknown symbol nvidia_p2p_free_page_table (err -2)

Loading the reference implementation into the kernel without any CUDA related function calls (via the build script ./build-for-any-no-cuda-native.sh) is no problem.

1 Like

On the Jetson Xavier AGX the problem can be solved by loading the kernel module nvidia-p2p, which includes the missing symbols.

But on the Jetson Orin AGX the module nvidia-p2p can not be loaded due to the following error:

[   61.086913] nvidia_p2p: exports duplicate symbol nvidia_p2p_dma_map_pages (owned by nvidia)
1 Like

I see the same problem. The module nvidia and nvidia-p2p try to register the same symbol.

1 Like

I’m seeing the same problem as described in all of these posts, attempting to run GitHub - NVIDIA/jetson-rdma-picoevb: Minimal HW-based demo of GPUDirect RDMA on NVIDIA Jetson AGX Xavier running L4T on Jetson AGX Orin with Jetpack 5.0.2 and SMMU disabled.

I have the same problem. nvidia kernel module loads the symbols that nvidia-p2p tries to load and nvidia-p2p fails to load with “exports duplicate symbol” error. I did not have this problem in L4T 34.1 – nvidia-p2p could load without any conflict with nvidia kernel module. This seems to be an inssue in L4T 35.1.

NB: I am testing on Orin AGX Development kit and the standard 35.1 kernel is loaded with SDK Manager.

I looked up the L4T Driver Package Source for R35.1 from here:

https://developer.nvidia.com/embedded/l4t/r35_release_v1.0/sources/public_sources.tbz2

Seem “nvidia” kernel module now builds nv-p2p functionalities and bundles in one single kernel module (nvidia.ko). So there should not be a need to manually insert nvidia-p2p.ko. But the problem is an older version of “nv-p2p.h” header is used. Meaning the header available on the Orin device (toolchain) is not the same header used while building “nvidia” kernel module. See below:

First few lines of “nv-p2p.h” from L4T Driver Package Source (used for building nvidia.ko, presumably):


/*                                                                                                                                                                        
 * SPDX-FileCopyrightText: Copyright (c) 2011-2016 NVIDIA CORPORATION & AFFILIATES. All rights reserved.                                                                  
 * SPDX-License-Identifier: MIT

First few lines of “nv-p2p.h” available on the Orin device (toolchain available to build on device, located under /usr/src/linux-headers-5.10.104-tegra-ubuntu20.04_aarch64/nvidia/include/linux/nv-p2p.h):

/*
 * Copyright (c) 2018-2019, NVIDIA Corporation.  All rights reserved.
 *

This explains why when I build my kernel module (driver for a PCIe device) on Orin, the generated module cannot be loaded and fails with “disagrees about version of symbol nvidia_p2p_dma_unmap_pages”. Same error as the one @ggrutzeck sees. This is because older version of symbols are built and loaded by “nvidia” kernel module, but the runtime toolchain only provides the most recent nv-p2p.h header.

Possible solutions are:

  1. re-build nvidia kernel module and do not include “nv-p2p.c” symbols. I assume this is dead end, cause there must have been a reason the kernel module includes some version of nv-p2p.c
  2. re-build nvidia kernel module with the most recent nv-p2p.h
  3. update the toolchain (e.g. nv-p2p.h) header on the Orin’s device and build my kernel module with the same header as used by nvidia kernel module. Then there shouldn’t be a need to insmod nvidia-p2p kernel module anymore.

@kayccc , thoughts?

1 Like

Sorry for the late response.
To resolved this PCIE DMA driver issues you met, please try with JetPack 5.0.2 GA release, then file your own issue separately.
I will have team to follow up.

I’m closing this issue now. Thanks