Hello
I have a setup with CUDA 12.6 on a VM. After relaunching my VM, I encountered an issue where the nvidia-smi
command returns the error: “No devices were found.” Strangely, which nvcc
still provides the correct output, indicating that CUDA is installed.
To troubleshoot, I reinstalled both the NVIDIA driver and CUDA. The reinstallation process completed successfully, but the problem persists—nvidia-smi
still shows “No devices were found.” Also, PyTorch no longer recognizes CUDA.
Have a huge logs (1g) with this line:
Aug 30 10:35:40 l4-90-gra11 kernel: NVRM: gpuHandleSanityCheckRegReadError_GM107: Possible bad register read: addr: 0x110118, regvalue: 0xbadf5620, error code: Unknown SYS_PRI_ERROR_CODE
Has anyone else experienced a similar issue? Could this be related to the VM configuration, or is there something I might be overlooking in the CUDA setup? Any advice or suggestions for further troubleshooting would be greatly appreciated!
nvidia-bug-report.log.gz (5.4 MB)
Thanks in advance for your help.