NVIDIA-SMI Shows "No Devices Found"

Hello

I have a setup with CUDA 12.6 on a VM. After relaunching my VM, I encountered an issue where the nvidia-smi command returns the error: “No devices were found.” Strangely, which nvcc still provides the correct output, indicating that CUDA is installed.

To troubleshoot, I reinstalled both the NVIDIA driver and CUDA. The reinstallation process completed successfully, but the problem persists—nvidia-smi still shows “No devices were found.” Also, PyTorch no longer recognizes CUDA.

Have a huge logs (1g) with this line:

Aug 30 10:35:40 l4-90-gra11 kernel: NVRM: gpuHandleSanityCheckRegReadError_GM107: Possible bad register read: addr: 0x110118, regvalue: 0xbadf5620, error code: Unknown SYS_PRI_ERROR_CODE

Has anyone else experienced a similar issue? Could this be related to the VM configuration, or is there something I might be overlooking in the CUDA setup? Any advice or suggestions for further troubleshooting would be greatly appreciated!

nvidia-bug-report.log.gz (5.4 MB)

Thanks in advance for your help.