Nvidia-fs could not be loaded: several "Unknown symbol" errors

I have temporary access to a Dell XE9680 server, in which I want to test the capabilities of a GDS+NVMe setup.

The server came with CUDA 12.2 preinstalled, but not nvidia-fs. After installing it, I received these errors on dmesg (even after a reboot):

[ 1406.272561] nvidia_fs: module using GPL-only symbols uses symbols from proprietary module nvidia.
[ 1406.272585] nvidia_fs: Unknown symbol nvidia_p2p_dma_unmap_pages (err -2)
[ 1406.272936] nvidia_fs: module using GPL-only symbols uses symbols from proprietary module nvidia.
[ 1406.272951] nvidia_fs: Unknown symbol nvidia_p2p_get_pages (err -2)
[ 1406.273262] nvidia_fs: module using GPL-only symbols uses symbols from proprietary module nvidia.
[ 1406.273276] nvidia_fs: Unknown symbol nvidia_p2p_put_pages (err -2)
[ 1406.273585] nvidia_fs: module using GPL-only symbols uses symbols from proprietary module nvidia.
[ 1406.273599] nvidia_fs: Unknown symbol nvidia_p2p_dma_map_pages (err -2)
[ 1406.273910] nvidia_fs: module using GPL-only symbols uses symbols from proprietary module nvidia.
[ 1406.273923] nvidia_fs: Unknown symbol nvidia_p2p_free_dma_mapping (err -2)
[ 1406.274228] nvidia_fs: module using GPL-only symbols uses symbols from proprietary module nvidia.
[ 1406.274241] nvidia_fs: Unknown symbol nvidia_p2p_free_page_table (err -2)

The system is an Ubuntu 20.04 running kernel 5.13.0-39 (pre-installed). Nvidia driver is 535.104.12 (pre-installed). And it has Mellanox OFED driver 5.8-1.0.1 (pre-installed).

What could be the issue?

Thanks,

I tried to compile from git and I get these warnings:

WARNING: modpost: "nvidia_p2p_free_page_table" [/mnt/data/src/gds-nvidia-fs/src/nvidia-fs.ko] undefined!
WARNING: modpost: "nvidia_p2p_free_dma_mapping" [/mnt/data/src/gds-nvidia-fs/src/nvidia-fs.ko] undefined!
WARNING: modpost: "nvidia_p2p_dma_map_pages" [/mnt/data/src/gds-nvidia-fs/src/nvidia-fs.ko] undefined!
WARNING: modpost: "nvidia_p2p_put_pages" [/mnt/data/src/gds-nvidia-fs/src/nvidia-fs.ko] undefined!
WARNING: modpost: "nvidia_p2p_get_pages" [/mnt/data/src/gds-nvidia-fs/src/nvidia-fs.ko] undefined!
WARNING: modpost: "nvidia_p2p_dma_unmap_pages" [/mnt/data/src/gds-nvidia-fs/src/nvidia-fs.ko] undefined!

If I run nm -a /lib/modules/5.13.0-39-generic/kernel/drivers/video/nvidia.ko | egrep nvidia_p2p I can see the symbols in the kernel object:

nm -a /lib/modules/5.13.0-39-generic/kernel/drivers/video/nvidia.ko | egrep nvidia_p2p
0000000066079585 A __crc_nvidia_p2p_cap_persistent_pages
00000000180f4b6a A __crc_nvidia_p2p_destroy_mapping
00000000ed64cf57 A __crc_nvidia_p2p_dma_map_pages
00000000dde5120a A __crc_nvidia_p2p_dma_unmap_pages
00000000f1b8045a A __crc_nvidia_p2p_free_dma_mapping
00000000f42ca687 A __crc_nvidia_p2p_free_page_table
000000005b3f3e79 A __crc_nvidia_p2p_get_pages
0000000036049d8e A __crc_nvidia_p2p_get_rsync_registers
0000000045bb0ad7 A __crc_nvidia_p2p_init_mapping
00000000642487ac A __crc_nvidia_p2p_put_pages
00000000136dfe04 A __crc_nvidia_p2p_put_rsync_registers
0000000067041c16 A __crc_nvidia_p2p_register_rsync_driver
00000000918d9b9c A __crc_nvidia_p2p_unregister_rsync_driver
0000000000000000 r __kstrtab_nvidia_p2p_cap_persistent_pages
000000000000003a r __kstrtab_nvidia_p2p_destroy_mapping
000000000000009e r __kstrtab_nvidia_p2p_dma_map_pages
00000000000000b8 r __kstrtab_nvidia_p2p_dma_unmap_pages
00000000000000d4 r __kstrtab_nvidia_p2p_free_dma_mapping
000000000000006c r __kstrtab_nvidia_p2p_free_page_table
0000000000000056 r __kstrtab_nvidia_p2p_get_pages
0000000000000137 r __kstrtab_nvidia_p2p_get_rsync_registers
0000000000000021 r __kstrtab_nvidia_p2p_init_mapping
0000000000000088 r __kstrtab_nvidia_p2p_put_pages
0000000000000157 r __kstrtab_nvidia_p2p_put_rsync_registers
00000000000000f1 r __kstrtab_nvidia_p2p_register_rsync_driver
0000000000000113 r __kstrtab_nvidia_p2p_unregister_rsync_driver
0000000000000020 r __kstrtabns_nvidia_p2p_cap_persistent_pages
0000000000000055 r __kstrtabns_nvidia_p2p_destroy_mapping
00000000000000b7 r __kstrtabns_nvidia_p2p_dma_map_pages
00000000000000d3 r __kstrtabns_nvidia_p2p_dma_unmap_pages
00000000000000f0 r __kstrtabns_nvidia_p2p_free_dma_mapping
0000000000000087 r __kstrtabns_nvidia_p2p_free_page_table
000000000000006b r __kstrtabns_nvidia_p2p_get_pages
0000000000000156 r __kstrtabns_nvidia_p2p_get_rsync_registers
0000000000000039 r __kstrtabns_nvidia_p2p_init_mapping
000000000000009d r __kstrtabns_nvidia_p2p_put_pages
0000000000000176 r __kstrtabns_nvidia_p2p_put_rsync_registers
0000000000000112 r __kstrtabns_nvidia_p2p_register_rsync_driver
0000000000000136 r __kstrtabns_nvidia_p2p_unregister_rsync_driver
000000000000030c r __ksymtab_nvidia_p2p_cap_persistent_pages
0000000000000318 r __ksymtab_nvidia_p2p_destroy_mapping
0000000000000324 r __ksymtab_nvidia_p2p_dma_map_pages
0000000000000330 r __ksymtab_nvidia_p2p_dma_unmap_pages
000000000000033c r __ksymtab_nvidia_p2p_free_dma_mapping
0000000000000348 r __ksymtab_nvidia_p2p_free_page_table
0000000000000354 r __ksymtab_nvidia_p2p_get_pages
0000000000000360 r __ksymtab_nvidia_p2p_get_rsync_registers
000000000000036c r __ksymtab_nvidia_p2p_init_mapping
0000000000000378 r __ksymtab_nvidia_p2p_put_pages
0000000000000384 r __ksymtab_nvidia_p2p_put_rsync_registers
0000000000000390 r __ksymtab_nvidia_p2p_register_rsync_driver
000000000000039c r __ksymtab_nvidia_p2p_unregister_rsync_driver
0000000000000700 D nvidia_p2p_cap_persistent_pages
000000000000a910 T nvidia_p2p_destroy_mapping
000000000000b670 T nvidia_p2p_dma_map_pages
000000000000aec0 T nvidia_p2p_dma_unmap_pages
000000000000afe0 T nvidia_p2p_free_dma_mapping
000000000000a920 T nvidia_p2p_free_page_table
000000000000aff0 T nvidia_p2p_get_pages
000000000000ac10 T nvidia_p2p_get_rsync_registers
000000000000a900 T nvidia_p2p_init_mapping
00000000000002a0 r nvidia_p2p_page_size_mappings
0000000000000030 B nvidia_p2p_page_t_cache
000000000000b550 T nvidia_p2p_put_pages
000000000000aba0 T nvidia_p2p_put_rsync_registers
000000000000ab50 T nvidia_p2p_register_rsync_driver
000000000000af30 T nvidia_p2p_unregister_rsync_driver

I double checked nvidia.ko is loaded.

Is this similar to GPUDirect RDMA - Module can not be insert into kernel - #27 by AastaLLL ?

starting 2.17.5 kernel driver of nvidia-fs.ko, Proprietory nvidia.ko is not supported with GDS.

please install nvidia open RM driver instead.
Starting with CUDA toolkit 12.2.2, GDS kernel driver package nvidia-gds version 12.2.2-1 (provided by nvidia-fs-dkms 2.17.5-1) and above is only supported with the NVIDIA open kernel driver.

Follow the instructions in Removing CUDA Toolkit and Driver to remove existing NVIDIA driver packages and then follow instructions in NVIDIA Open GPU Kernel Modules to install NVIDIA open kernel driver packages.

Thanks, I did install the open source driver, but I don’t have full control over the system to meet the other requirements. Because of this the problem persisted. I’ll try to arrange the full set of requirements if I have another opportunity with that system.

You can use the proprietary drivers
and revert the nvidia-fs-source from 2.17.5 to 2.17.4 , rebuild the nvfs and it can work okay.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.