Hello,
I’m currently facing an issue with setting up a B200 cluster and would like to ask for some guidance.
I’m using a system with HGX 8xB200, with the following software versions:
- OS: RHEL 9 (Red Hat Enterprise Linux 9)
- NVIDA Driver: 580.105.08
- NVIDIA Fabric Manager: 580.105.08
- CUDA Toolkits: cuda_13.0
Here’re some output to verify installation of the above:
-
nvidia-smi
-
nvcc –-version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2025 NVIDIA Corporation
Built on Wed_Aug_20_01:58:59_PM_PDT_2025
Cuda compilation tools, release 13.0, V13.0.88
Build cuda_13.0.r13.0/compiler.36424714_0 -
sudo systemctl status nvidia-fabricmanager
However, I encountered an initialization error when running ./deviceQuery:
./deviceQuery
./deviceQuery Starting…
CUDA Device Query (Runtime API) version (CUDART static linking)
cudaGetDeviceCount returned 3
→ initialization error
Result = FAIL
Additionally, to start the nvidia-fabricmanager I will have to manually load ib_umad module. Otherwise, it will fail to start the service. Is this normal? Is there a way to make ib_umad to automatically load at reboot?
Any insight or recommendations would be greatly appreciated.
Thank you!

