I had received a Jetson Orin Developer Kit. Someone had already installed jetpack and cuda.
$ sudo apt-cache show nvidia-jetpack
Package: nvidia-jetpack
Version: 5.1-b147
Architecture: arm64
Maintainer: NVIDIA Corporation
Installed-Size: 194
Depends: nvidia-jetpack-runtime (= 5.1-b147), nvidia-jetpack-dev (= 5.1-b147)
Homepage: http://developer.nvidia.com/jetson
$ uname -a
Linux eol-agx 5.10.104-tegra #1 SMP PREEMPT Tue Jan 24 15:09:44 PST 2023 aarch64 aarch64 aarch64 GNU/Linux
$ sudo apt-cache show nvidia-jetpack
Package: nvidia-jetpack
Version: 5.1-b147
Architecture: arm64
Maintainer: NVIDIA Corporation
Installed-Size: 194
Depends: nvidia-jetpack-runtime (= 5.1-b147), nvidia-jetpack-dev (= 5.1-b147)
Homepage: http://developer.nvidia.com/jetson
Our code was working well so i didn’t want to change anything until I started getting “Orin does not support RDMA” errors. So i first built jetson-rdma-picoevb from here GitHub - NVIDIA/jetson-rdma-picoevb: Minimal HW-based demo of GPUDirect RDMA on NVIDIA Jetson AGX Xavier running L4T. Specifically:
$ sudo apt install build-essential bc
$ cd jetson-rdma-picoevb/kernel-module/
$ ./build-for-jetson-drive-igpu-native.sh
$ sudo insmod /lib/modules/5.10.120-tegra/kernel/drivers/nv-p2p/nvidia-p2p.ko
$ sudo insmod ./picoevb-rdma.ko
getting the error at insmod nvida-p2p.ko:
insmod: ERROR: could not insert module /lib/modules/5.10.104-tegra/kernel/drivers/nv-p2p/nvidia-p2p.ko: Invalid module format
I followed the “solution” too:
following which i changed all occurances of
nvidia_p2p_cap_persistent_pages
nvidia_p2p_init_mapping
nvidia_p2p_destroy_mapping
nvidia_p2p_get_pages
nvidia_p2p_free_page_table
nvidia_p2p_put_pages
nvidia_p2p_dma_map_pages
nvidia_p2p_dma_unmap_pages
nvidia_p2p_free_dma_mapping
nvidia_p2p_register_rsync_driver
nvidia_p2p_unregister_rsync_driver
nvidia_p2p_get_rsync_registers
nvidia_p2p_put_rsync_registers
to
nvidia_p2p_cap_persistent_pages_old
nvidia_p2p_init_mapping_old
nvidia_p2p_destroy_mapping_old
nvidia_p2p_get_pages_old
nvidia_p2p_free_page_table_old
nvidia_p2p_put_pages_old
nvidia_p2p_dma_map_pages_old
nvidia_p2p_dma_unmap_pages_old
nvidia_p2p_free_dma_mapping_old
nvidia_p2p_register_rsync_driver_old
nvidia_p2p_unregister_rsync_driver_old
nvidia_p2p_get_rsync_registers_old
nvidia_p2p_put_rsync_registers_old
respectively in nv-p2p.c & nv-p2p.h, (my doubt being the first nvidia_p2p_cap_persistent_pages_old. Should it be changed because it is not a function but an int value?)
I ran make and replaced the /lib/modules/5.10.104-tegra/extra/opensrc-eisp/nvidia.ko file.
After that, I tried to insmod /nvidia-p2p.ko again but its the same “Invalid module format error” and dmesg is:
[ 4468.458651] nvidia_p2p: exports duplicate symbol nvidia_p2p_dma_map_pages (owned by nvidia)
[10767.592543] nvidia_p2p: exports duplicate symbol nvidia_p2p_dma_map_pages (owned by nvidia)
[10781.789977] nvidia_p2p: exports duplicate symbol nvidia_p2p_dma_map_pages (owned by nvidia)
[11219.215248] nvidia_p2p: exports duplicate symbol nvidia_p2p_dma_map_pages (owned by nvidia)
[11651.348943] nvidia_p2p: exports duplicate symbol nvidia_p2p_dma_map_pages (owned by nvidia)
[12790.267525] picoevb_rdma: module verification failed: signature and/or required key missing - tainting kernel
[12790.277973] picoevb_rdma: disagrees about version of symbol nvidia_p2p_dma_unmap_pages
[12790.286175] picoevb_rdma: Unknown symbol nvidia_p2p_dma_unmap_pages (err -22)
[12790.293600] picoevb_rdma: disagrees about version of symbol nvidia_p2p_get_pages
[12790.301225] picoevb_rdma: Unknown symbol nvidia_p2p_get_pages (err -22)
[12790.308057] picoevb_rdma: disagrees about version of symbol nvidia_p2p_put_pages
[12790.315675] picoevb_rdma: Unknown symbol nvidia_p2p_put_pages (err -22)
[12790.322505] picoevb_rdma: disagrees about version of symbol nvidia_p2p_dma_map_pages
[12790.330477] picoevb_rdma: Unknown symbol nvidia_p2p_dma_map_pages (err -22)
[12790.337671] picoevb_rdma: disagrees about version of symbol nvidia_p2p_free_page_table
[12790.345828] picoevb_rdma: Unknown symbol nvidia_p2p_free_page_table (err -22)
[13270.242122] nvidia_p2p: exports duplicate symbol nvidia_p2p_dma_map_pages (owned by nvidia)
[13460.444084] nvidia_p2p: exports duplicate symbol nvidia_p2p_dma_map_pages (owned by nvidia)
[13622.548524] nvidia_p2p: exports duplicate symbol nvidia_p2p_dma_map_pages (owned by nvidia)
[13842.213783] nvidia_p2p: exports duplicate symbol nvidia_p2p_dma_map_pages (owned by nvidia)
[13887.854074] nvidia_p2p: exports duplicate symbol nvidia_p2p_dma_map_pages (owned by nvidia)
[14359.759132] nvidia_p2p: exports duplicate symbol nvidia_p2p_dma_map_pages (owned by nvidia)
[16656.209245] nvidia_p2p: exports duplicate symbol nvidia_p2p_dma_map_pages (owned by nvidia)
Here is the log of when i tried to change and make the nvidia.ko
ManualChangeAndKernelInsertion.log (257.1 KB)
What am I doing wrong? Please help.