we developed a custom PCIe DMA driver. It is accessing the GPU related memory space for direct transfers.
On the Jetson Xavier AGX the driver works as expected. But it can not be loaded into the kernel on the Jetson Orin AGX.
It produces the following errors in the kernel log:
[ 633.909968] my_dma: disagrees about version of symbol nvidia_p2p_dma_unmap_pages
[ 633.910199] my_dma: Unknown symbol nvidia_p2p_dma_unmap_pages (err -22)
[ 633.910423] my_dma: disagrees about version of symbol nvidia_p2p_get_pages
[ 633.910622] my_dma: Unknown symbol nvidia_p2p_get_pages (err -22)
[ 633.910814] my_dma: disagrees about version of symbol nvidia_p2p_put_pages
[ 633.911028] my_dma: Unknown symbol nvidia_p2p_put_pages (err -22)
[ 633.911218] my_dma: disagrees about version of symbol nvidia_p2p_dma_map_pages
[ 633.911431] my_dma: Unknown symbol nvidia_p2p_dma_map_pages (err -22)
[ 633.911623] my_dma: disagrees about version of symbol nvidia_p2p_free_page_table
[ 633.911833] my_dma: Unknown symbol nvidia_p2p_free_page_table (err -22)
The Orin Module runs the newest version provided by the SDK Manager (L4T 34.1.1). Of cause the kernel module is build locally against the used kernel.
What could be the problem on the Jetson Orin with the custom driver?
Or this this mode not yet supported on the Jetson Orin?
Is there any change (from stock SW) in the SW configuration between Xavier and Orin? particular w.r.t disabling SMMU Etc??
Also, have you tried the same BSP version + your SW stack on both Xavier and Orin?
The problem seems to be related to the SW version.
I installed the latest version (JetPack DP 5.0.1) via the SDK Manager on an Xavier AGX module and tried to load the module.
I got the following errors:
[ 230.312480] my_dma: Unknown symbol nvidia_p2p_dma_unmap_pages (err -2)
[ 230.312762] my_dma: Unknown symbol nvidia_p2p_get_pages (err -2)
[ 230.312940] my_dma: Unknown symbol nvidia_p2p_put_pages (err -2)
[ 230.313121] my_dma: Unknown symbol nvidia_p2p_dma_map_pages (err -2)
[ 230.313290] my_dma: Unknown symbol nvidia_p2p_free_page_table (err -2)
[ 894.912977] picoevb_rdma: Unknown symbol nvidia_p2p_dma_unmap_pages (err -2)
[ 894.913252] picoevb_rdma: Unknown symbol nvidia_p2p_get_pages (err -2)
[ 894.913438] picoevb_rdma: Unknown symbol nvidia_p2p_put_pages (err -2)
[ 894.913607] picoevb_rdma: Unknown symbol nvidia_p2p_dma_map_pages (err -2)
[ 894.913799] picoevb_rdma: Unknown symbol nvidia_p2p_free_page_table (err -2)
Loading the reference implementation into the kernel without any CUDA related function calls (via the build script ./build-for-any-no-cuda-native.sh) is no problem.
I have the same problem. nvidia kernel module loads the symbols that nvidia-p2p tries to load and nvidia-p2p fails to load with “exports duplicate symbol” error. I did not have this problem in L4T 34.1 – nvidia-p2p could load without any conflict with nvidia kernel module. This seems to be an inssue in L4T 35.1.
NB: I am testing on Orin AGX Development kit and the standard 35.1 kernel is loaded with SDK Manager.
Seem “nvidia” kernel module now builds nv-p2p functionalities and bundles in one single kernel module (nvidia.ko). So there should not be a need to manually insert nvidia-p2p.ko. But the problem is an older version of “nv-p2p.h” header is used. Meaning the header available on the Orin device (toolchain) is not the same header used while building “nvidia” kernel module. See below:
First few lines of “nv-p2p.h” from L4T Driver Package Source (used for building nvidia.ko, presumably):
/*
* SPDX-FileCopyrightText: Copyright (c) 2011-2016 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
* SPDX-License-Identifier: MIT
First few lines of “nv-p2p.h” available on the Orin device (toolchain available to build on device, located under /usr/src/linux-headers-5.10.104-tegra-ubuntu20.04_aarch64/nvidia/include/linux/nv-p2p.h):
/*
* Copyright (c) 2018-2019, NVIDIA Corporation. All rights reserved.
*
This explains why when I build my kernel module (driver for a PCIe device) on Orin, the generated module cannot be loaded and fails with “disagrees about version of symbol nvidia_p2p_dma_unmap_pages”. Same error as the one @ggrutzeck sees. This is because older version of symbols are built and loaded by “nvidia” kernel module, but the runtime toolchain only provides the most recent nv-p2p.h header.
Possible solutions are:
re-build nvidia kernel module and do not include “nv-p2p.c” symbols. I assume this is dead end, cause there must have been a reason the kernel module includes some version of nv-p2p.c
re-build nvidia kernel module with the most recent nv-p2p.h
update the toolchain (e.g. nv-p2p.h) header on the Orin’s device and build my kernel module with the same header as used by nvidia kernel module. Then there shouldn’t be a need to insmod nvidia-p2p kernel module anymore.
Sorry for the late response.
To resolved this PCIE DMA driver issues you met, please try with JetPack 5.0.2 GA release, then file your own issue separately.
I will have team to follow up.