I have temporary access to a Dell XE9680 server, in which I want to test the capabilities of a GDS+NVMe setup.
The server came with CUDA 12.2 preinstalled, but not nvidia-fs. After installing it, I received these errors on dmesg (even after a reboot):
[ 1406.272561] nvidia_fs: module using GPL-only symbols uses symbols from proprietary module nvidia.
[ 1406.272585] nvidia_fs: Unknown symbol nvidia_p2p_dma_unmap_pages (err -2)
[ 1406.272936] nvidia_fs: module using GPL-only symbols uses symbols from proprietary module nvidia.
[ 1406.272951] nvidia_fs: Unknown symbol nvidia_p2p_get_pages (err -2)
[ 1406.273262] nvidia_fs: module using GPL-only symbols uses symbols from proprietary module nvidia.
[ 1406.273276] nvidia_fs: Unknown symbol nvidia_p2p_put_pages (err -2)
[ 1406.273585] nvidia_fs: module using GPL-only symbols uses symbols from proprietary module nvidia.
[ 1406.273599] nvidia_fs: Unknown symbol nvidia_p2p_dma_map_pages (err -2)
[ 1406.273910] nvidia_fs: module using GPL-only symbols uses symbols from proprietary module nvidia.
[ 1406.273923] nvidia_fs: Unknown symbol nvidia_p2p_free_dma_mapping (err -2)
[ 1406.274228] nvidia_fs: module using GPL-only symbols uses symbols from proprietary module nvidia.
[ 1406.274241] nvidia_fs: Unknown symbol nvidia_p2p_free_page_table (err -2)
The system is an Ubuntu 20.04 running kernel 5.13.0-39 (pre-installed). Nvidia driver is 535.104.12 (pre-installed). And it has Mellanox OFED driver 5.8-1.0.1 (pre-installed).
If I run nm -a /lib/modules/5.13.0-39-generic/kernel/drivers/video/nvidia.ko | egrep nvidia_p2p I can see the symbols in the kernel object:
nm -a /lib/modules/5.13.0-39-generic/kernel/drivers/video/nvidia.ko | egrep nvidia_p2p
0000000066079585 A __crc_nvidia_p2p_cap_persistent_pages
00000000180f4b6a A __crc_nvidia_p2p_destroy_mapping
00000000ed64cf57 A __crc_nvidia_p2p_dma_map_pages
00000000dde5120a A __crc_nvidia_p2p_dma_unmap_pages
00000000f1b8045a A __crc_nvidia_p2p_free_dma_mapping
00000000f42ca687 A __crc_nvidia_p2p_free_page_table
000000005b3f3e79 A __crc_nvidia_p2p_get_pages
0000000036049d8e A __crc_nvidia_p2p_get_rsync_registers
0000000045bb0ad7 A __crc_nvidia_p2p_init_mapping
00000000642487ac A __crc_nvidia_p2p_put_pages
00000000136dfe04 A __crc_nvidia_p2p_put_rsync_registers
0000000067041c16 A __crc_nvidia_p2p_register_rsync_driver
00000000918d9b9c A __crc_nvidia_p2p_unregister_rsync_driver
0000000000000000 r __kstrtab_nvidia_p2p_cap_persistent_pages
000000000000003a r __kstrtab_nvidia_p2p_destroy_mapping
000000000000009e r __kstrtab_nvidia_p2p_dma_map_pages
00000000000000b8 r __kstrtab_nvidia_p2p_dma_unmap_pages
00000000000000d4 r __kstrtab_nvidia_p2p_free_dma_mapping
000000000000006c r __kstrtab_nvidia_p2p_free_page_table
0000000000000056 r __kstrtab_nvidia_p2p_get_pages
0000000000000137 r __kstrtab_nvidia_p2p_get_rsync_registers
0000000000000021 r __kstrtab_nvidia_p2p_init_mapping
0000000000000088 r __kstrtab_nvidia_p2p_put_pages
0000000000000157 r __kstrtab_nvidia_p2p_put_rsync_registers
00000000000000f1 r __kstrtab_nvidia_p2p_register_rsync_driver
0000000000000113 r __kstrtab_nvidia_p2p_unregister_rsync_driver
0000000000000020 r __kstrtabns_nvidia_p2p_cap_persistent_pages
0000000000000055 r __kstrtabns_nvidia_p2p_destroy_mapping
00000000000000b7 r __kstrtabns_nvidia_p2p_dma_map_pages
00000000000000d3 r __kstrtabns_nvidia_p2p_dma_unmap_pages
00000000000000f0 r __kstrtabns_nvidia_p2p_free_dma_mapping
0000000000000087 r __kstrtabns_nvidia_p2p_free_page_table
000000000000006b r __kstrtabns_nvidia_p2p_get_pages
0000000000000156 r __kstrtabns_nvidia_p2p_get_rsync_registers
0000000000000039 r __kstrtabns_nvidia_p2p_init_mapping
000000000000009d r __kstrtabns_nvidia_p2p_put_pages
0000000000000176 r __kstrtabns_nvidia_p2p_put_rsync_registers
0000000000000112 r __kstrtabns_nvidia_p2p_register_rsync_driver
0000000000000136 r __kstrtabns_nvidia_p2p_unregister_rsync_driver
000000000000030c r __ksymtab_nvidia_p2p_cap_persistent_pages
0000000000000318 r __ksymtab_nvidia_p2p_destroy_mapping
0000000000000324 r __ksymtab_nvidia_p2p_dma_map_pages
0000000000000330 r __ksymtab_nvidia_p2p_dma_unmap_pages
000000000000033c r __ksymtab_nvidia_p2p_free_dma_mapping
0000000000000348 r __ksymtab_nvidia_p2p_free_page_table
0000000000000354 r __ksymtab_nvidia_p2p_get_pages
0000000000000360 r __ksymtab_nvidia_p2p_get_rsync_registers
000000000000036c r __ksymtab_nvidia_p2p_init_mapping
0000000000000378 r __ksymtab_nvidia_p2p_put_pages
0000000000000384 r __ksymtab_nvidia_p2p_put_rsync_registers
0000000000000390 r __ksymtab_nvidia_p2p_register_rsync_driver
000000000000039c r __ksymtab_nvidia_p2p_unregister_rsync_driver
0000000000000700 D nvidia_p2p_cap_persistent_pages
000000000000a910 T nvidia_p2p_destroy_mapping
000000000000b670 T nvidia_p2p_dma_map_pages
000000000000aec0 T nvidia_p2p_dma_unmap_pages
000000000000afe0 T nvidia_p2p_free_dma_mapping
000000000000a920 T nvidia_p2p_free_page_table
000000000000aff0 T nvidia_p2p_get_pages
000000000000ac10 T nvidia_p2p_get_rsync_registers
000000000000a900 T nvidia_p2p_init_mapping
00000000000002a0 r nvidia_p2p_page_size_mappings
0000000000000030 B nvidia_p2p_page_t_cache
000000000000b550 T nvidia_p2p_put_pages
000000000000aba0 T nvidia_p2p_put_rsync_registers
000000000000ab50 T nvidia_p2p_register_rsync_driver
000000000000af30 T nvidia_p2p_unregister_rsync_driver
starting 2.17.5 kernel driver of nvidia-fs.ko, Proprietory nvidia.ko is not supported with GDS.
please install nvidia open RM driver instead.
Starting with CUDA toolkit 12.2.2, GDS kernel driver package nvidia-gds version 12.2.2-1 (provided by nvidia-fs-dkms 2.17.5-1) and above is only supported with the NVIDIA open kernel driver.
Thanks, I did install the open source driver, but I don’t have full control over the system to meet the other requirements. Because of this the problem persisted. I’ll try to arrange the full set of requirements if I have another opportunity with that system.