cudaGetDeviceCount returned 3 -> initialization error, CUDA 13.0, RHEL 9, HGX B200

JeffreyWong20 · January 9, 2026, 5:50pm

Hello,

I’m currently facing an issue with setting up a B200 cluster and would like to ask for some guidance.

I’m using a system with HGX 8xB200, with the following software versions:

OS: RHEL 9 (Red Hat Enterprise Linux 9)
NVIDA Driver: 580.105.08
NVIDIA Fabric Manager: 580.105.08
CUDA Toolkits: cuda_13.0

Here’re some output to verify installation of the above:

nvidia-smi

Screenshot 2026-01-09 at 17.39.171094×1215 59.6 KB
nvcc –-version

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2025 NVIDIA Corporation
Built on Wed_Aug_20_01:58:59_PM_PDT_2025
Cuda compilation tools, release 13.0, V13.0.88
Build cuda_13.0.r13.0/compiler.36424714_0
sudo systemctl status nvidia-fabricmanager

Screenshot 2026-01-09 at 17.44.131668×625 141 KB

However, I encountered an initialization error when running ./deviceQuery:

./deviceQuery
./deviceQuery Starting…
CUDA Device Query (Runtime API) version (CUDART static linking)

cudaGetDeviceCount returned 3
→ initialization error
Result = FAIL

Additionally, to start the nvidia-fabricmanager I will have to manually load ib_umad module. Otherwise, it will fail to start the service. Is this normal? Is there a way to make ib_umad to automatically load at reboot?

Any insight or recommendations would be greatly appreciated.
Thank you!

JeffreyWong20 · January 12, 2026, 9:27pm

Hi,

We managed to resolve the issue by the follows:

lowering the CUDA version to 12.8
Disabling kaslr grubby --update-kernel=ALL --args=nokaslr
Installing doca_ofed for automatic loading ib_umad!

Topic		Replies	Views
B200 HGX server shows CUDA initialization failure with error Error 3: initialization failed CUDA Setup and Installation kernel	1	201	December 17, 2025
CUDA initialization failure with error Error 802: system not yet initialized GPU - Hardware tensorrt , cuda , pytorch	9	3054	November 11, 2025
Cuda 12.8 with Driver Version: 570.124.06 on B200 HGX getting code=3(cudaErrorInitializationError) CUDA Setup and Installation	3	1465	May 21, 2025
GH100 deviceQuery got cudaGetDeviceCount returned 802 CUDA Setup and Installation	1	857	March 4, 2024
CUDA device not initialized error on all calls, HGX A100, Centos 7 Linux cuda	8	5021	November 22, 2021
CUDA device not initialized error on all calls, HGX A100, Centos 7 (Crosspost from Linux Forum) CUDA Setup and Installation	0	567	November 1, 2021
CUDA initialization error on 8x A100 GPU HGX server CUDA Setup and Installation	7	7714	November 4, 2023
CUDA can't initialize after upgrade CUDA Setup and Installation	2	457	May 19, 2025
Error with B200 cuda setup with torch.cuda cannot load CUDA Setup and Installation	1	453	July 16, 2025
System Not Initialized (ReturnCodes 802 and 83) CUDA Setup and Installation	6	7810	January 22, 2022

cudaGetDeviceCount returned 3 -> initialization error, CUDA 13.0, RHEL 9, HGX B200

Related topics