CUDA 13.0 won't work with open-source drivers (NVIDIA H100 NVL)

Hi,
I am currently trying to setup a lab with 4 H100 NVL to have GPUDirect Storage to local and remote NVMe drive.
I have follow CUDA and NVIDIA drivers installation guide:

$ nvidia-smi
Thu Sep  4 22:33:04 2025
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 580.82.07              Driver Version: 580.82.07      CUDA Version: 13.0     |
+-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA H100 NVL                On  |   00000000:03:00.0 Off |                    0 |
| N/A   39C    P0             60W /  400W |       0MiB /  95830MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   1  NVIDIA H100 NVL                On  |   00000000:0B:00.0 Off |                    0 |
| N/A   38C    P0             61W /  400W |       0MiB /  95830MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   2  NVIDIA H100 NVL                On  |   00000000:61:00.0 Off |                    0 |
| N/A   39C    P0             62W /  400W |       0MiB /  95830MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   3  NVIDIA H100 NVL                On  |   00000000:69:00.0 Off |                    0 |
| N/A   40C    P0             61W /  400W |       0MiB /  95830MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+

But running anything that require CUDA results in an error:

$ ./build/Samples/1_Utilities/deviceQuery/deviceQuery
./build/Samples/1_Utilities/deviceQuery/deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

cudaGetDeviceCount returned 3
-> initialization error
Result = FAIL
$ ./tools/gdscheck -p
 cuInit Failed, error CUDA_ERROR_NOT_INITIALIZED
 cuFile initialization failed
 Platform verification error :
CUDA Driver API error

Switching to proprietary NVIDIA drivers actually solve this issue!
However, I cannot now use GDS as nvidia-fs is open-sourced since version 2.17.5 (currently used 2.26) and hence cannot interface with proprietary symbols from the kernel modules.

I then reach out to you there in order to hopefully find a solution.

$ dmesg | grep nvidia
[   10.297171] nvidia-nvlink: Nvlink Core is being initialized, major device number 237
[   10.300673] nvidia 0000:61:00.0: enabling device (0000 -> 0002)
[   10.311994] nvidia 0000:69:00.0: enabling device (0000 -> 0002)
[   10.330656] nvidia 0000:03:00.0: enabling device (0000 -> 0002)
[   10.351994] nvidia 0000:0b:00.0: enabling device (0000 -> 0002)
[   10.487855] Backport based on https://:@git-nbu.nvidia.com/r/a/mlnx_ofed/mlnx-ofa_kernel-4.0.git eb6cb58
[   10.488744] compat.git: https://:@git-nbu.nvidia.com/r/a/mlnx_ofed/mlnx-ofa_kernel-4.0.git
[   10.664830] nvidia-modeset: Loading NVIDIA UNIX Open Kernel Mode Setting Driver for x86_64  580.82.07  Release Build  (dvs-builder@U22-I3-B07-03-2)  Wed Aug 27 18:06:05 UTC 2025
[   21.293809] nvidia_fs: no symbol version for nvidia_p2p_dma_unmap_pages
[   21.300609] [drm] [nvidia-drm] [GPU ID 0x00006100] Loading driver
[   21.309358] nvidia_fs: Initializing nvfs driver module
[   21.309863] nvidia_fs: registered correctly with major number 511
[   23.681891] [drm] Initialized nvidia-drm 0.0.0 20160202 for 0000:61:00.0 on minor 1
[   23.681915] nvidia 0000:61:00.0: [drm] No compatible format found
[   23.681919] nvidia 0000:61:00.0: [drm] Cannot find any crtc or sizes
[   23.681942] [drm] [nvidia-drm] [GPU ID 0x00006900] Loading driver
[   25.917517] [drm] Initialized nvidia-drm 0.0.0 20160202 for 0000:69:00.0 on minor 2
[   25.917539] nvidia 0000:69:00.0: [drm] No compatible format found
[   25.917542] nvidia 0000:69:00.0: [drm] Cannot find any crtc or sizes
[   25.917573] [drm] [nvidia-drm] [GPU ID 0x00000300] Loading driver
[   28.156453] [drm] Initialized nvidia-drm 0.0.0 20160202 for 0000:03:00.0 on minor 3
[   28.156476] nvidia 0000:03:00.0: [drm] No compatible format found
[   28.156478] nvidia 0000:03:00.0: [drm] Cannot find any crtc or sizes
[   28.156508] [drm] [nvidia-drm] [GPU ID 0x00000b00] Loading driver
[   30.398077] [drm] Initialized nvidia-drm 0.0.0 20160202 for 0000:0b:00.0 on minor 4
[   30.398090] nvidia 0000:0b:00.0: [drm] No compatible format found
[   30.398092] nvidia 0000:0b:00.0: [drm] Cannot find any crtc or sizes
nvidia_uvm           2158592  0
nvidia_peermem         16384  0
ib_uverbs             200704  2 nvidia_peermem,mlx5_ib
nvidia_fs             274432  0
nvidia_drm            139264  0
nvidia_modeset       1744896  1 nvidia_drm
video                  77824  1 nvidia_modeset
nvidia              14368768  21 nvidia_uvm,nvidia_peermem,nvidia_fs,nvidia_modeset
ecc                    45056  1 nvidia
$ dmesg | grep -i iommu
[    0.000000] Command line: BOOT_IMAGE=/vmlinuz-6.8.0-79-generic root=/dev/mapper/ubuntu--vg-ubuntu--lv ro amd_iommu=off
[    0.559183] Kernel command line: BOOT_IMAGE=/vmlinuz-6.8.0-79-generic root=/dev/mapper/ubuntu--vg-ubuntu--lv ro amd_iommu=off
[    6.013241] iommu: Default domain type: Translated
[    6.013241] iommu: DMA domain TLB invalidation policy: lazy mode

For those facing the same issue, try disabling kernel ASLR by adding `nokaslr` to your grub options.

Hi,
Did you manage to get GDS with the open-source driver working? If so, was setting the GRUB option the only fix needed? Thanks!