Cuda 12.8 with Driver Version: 570.124.06 on B200 HGX getting code=3(cudaErrorInitializationError)

nvidia-smi
Wed Apr 23 20:39:55 2025       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 570.124.06             Driver Version: 570.124.06     CUDA Version: 12.8     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA B200                    Off |   00000000:17:00.0 Off |                    0 |
| N/A   32C    P0            142W / 1000W |       1MiB / 183359MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   1  NVIDIA B200                    Off |   00000000:3D:00.0 Off |                    0 |
| N/A   37C    P0            143W / 1000W |       1MiB / 183359MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   2  NVIDIA B200                    Off |   00000000:5F:00.0 Off |                    0 |
| N/A   38C    P0            146W / 1000W |       1MiB / 183359MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   3  NVIDIA B200                    Off |   00000000:70:00.0 Off |                    0 |
| N/A   32C    P0            140W / 1000W |       1MiB / 183359MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   4  NVIDIA B200                    Off |   00000000:97:00.0 Off |                    0 |
| N/A   32C    P0            139W / 1000W |       1MiB / 183359MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   5  NVIDIA B200                    Off |   00000000:BA:00.0 Off |                    0 |
| N/A   39C    P0            143W / 1000W |       1MiB / 183359MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   6  NVIDIA B200                    Off |   00000000:DC:00.0 Off |                    0 |
| N/A   39C    P0            144W / 1000W |       1MiB / 183359MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   7  NVIDIA B200                    Off |   00000000:ED:00.0 Off |                    0 |
| N/A   34C    P0            140W / 1000W |       1MiB / 183359MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+
nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2025 NVIDIA Corporation
Built on Fri_Feb_21_20:23:50_PST_2025
Cuda compilation tools, release 12.8, V12.8.93
Build cuda_12.8.r12.8/compiler.35583870_0

I have the same version for fabric manager and it is running

nvidia-smi -q -i 0 | grep -i -A 2 Fabric
        GPU Fabric GUID                   : 0x8a55d791d007e9f6
    Inforom Version
        Image Version                     : G525.0220.00.03
--
    Fabric
        State                             : Completed
        Status                            : Success

using CUDA Samples I see

build/Samples/1_Utilities/deviceQuery$ ./deviceQuery 
./deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

cudaGetDeviceCount returned 3
-> initialization error
Result = FAIL
lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 24.04.2 LTS
Release:        24.04
Codename:       noble

is Cuda 12.8 is incompatible with the driver I am using?

I’m running into similar issues with Blackwell pros, Did you end up figuring this out?

I didn’t solve it with ubuntu 24.04 yet
I am currently using ubuntu 22.04 with 6.5 kernel and it works well

NVIDIA-SMI 570.133.20             Driver Version: 570.133.20     CUDA Version: 12.8
lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 22.04.5 LTS
Release:        22.04
Codename:       jammy
1 Like

I had the same issue on HGX B200 8-GPU, with this driver version (and others).
What solved this for me was to add “quiet splash nokaslr” in the kernel boot parameters

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.