- Device is finicky, I have been trying different driver versions and PCI config but it doesn’t stabilize.
- Typical symptom is device “crashes” and after that nvidia-smi hangs for a long time and report shows power consumption as “ERR!” - in this state it’s unusable until reboot.
- Machine is headless, no X11, nouveau and nvidiafb are blacklisted.
- There is an old GTX 460 also plugged in just for a video output for installing the OS on the machine.
- gpu-burn worked for a while and then showed an error and nvidia-smi began showing symptoms (2)
- Stopping gpu-burn worked, but re-launching it failed with “no CUDA-capable device is detected” and at this point dmesg showed “GPU has fallen off the bus”.
I kind of suspect you’re going to tell me my card is junk, but I was hoping there would be some way to stabilize it - e.g. to identify the faulty core and disable it. Thanks.
nvidia-bug-report.log.gz (183.8 KB)
