nvidia-bug-report.log.gz (705.9 KB)
This workstation has 4 GPUs. A week ago, one of the GPUs falls off the bus. When running “nvidia-smi”, the error message says “Unable to determine the device handle for GPU 0000:1A:00.0: Unknown Error”. After rebooting, only 3 GPUs show up in nvidia-smi and lspci. The original bus id is not showing in the folder /sys/bus/pci/devices/, either. I have tried updating the driver to version 510 and rebooting, but nothing changed. Could you please give some ideas that if this is a software issue or a hardware issue? This workstation had to exchange one of these GPUs because of bus id once. Thank you so much!
Plain hardware issue, the gpu doesn’t appear on the pci bus anymore. Please check if disconnecting the system from power for some minutes makes it reappear. If that doesn’t help, check if it works in another system, otherwise it’s broken.
Thank you so much! We were advised to exchange the slots of two GPUs last time when we had a hardware issue. Does this equal check it in another system?
Yes, if the other slot is known-working.
Appreciate it!
This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.