Hello, I’ve been having trouble stress testing my RTX 6000s for the past week. I have tried cuda version 12.8,12.9, and 13.0 with no luck. I keep getting the error, No Cuda devices found. I have also tried different stress tests including pytorch, gpu-burn, dcgmi diag, and the phoronix test suite all without any luck. These were tried using ubuntu 24.04, as well as a lightweight version of linux. I have included a screenshot below of what my error message looks like. I have Nvidia open drivers installed as nvidia-smi and nvcc both work. Any help would be greatly appreciated
Hello,
we had exactly the same problem on a different configuration, but with CUDA (12.9/13/13.1), RTX6000 BSE, driver 680.105
OS: Rocky 9.7
Nvidia smi worked but not CUDA.
Finally, what worked:
We disabled the HMM option in nvidia_uvm.
cat >/etc/modprobe.d/nvidia-uvm.conf <<‘EOF’
options nvidia_uvm uvm_disable_hmm=1
EOF
modprobe -r nvidia_uvm
modprobe nvidia_uvm
There you go. I hope this helps you or anyone else who comes across this.
It took us a long time to figure it out…
