79, GPU has fallen off the bus (RTX 2000)

Hi,

i am experiencing the oft-repeated here problem of 79, GPU has fallen off the bus on my NVIDIA RTX 2000 Ada Generation on my new Lenovo with Ubuntu 24.04.2 LTS.

I started off with 550 driver and the GPU would never start up at all and i was seeing the error on nvidia-smi by default:

Unable to determine the device handle for GPU0: 0000:01:00.0: Unknown Error
No devices were found

I then installed the nvidia-driver-570-open and nvidia-driver-570. The result with the former was the same, with the latter however the nvidia GPU did start for 3-5 mins, only for me to get logged off and see the same error after logging in again.

What did not work for me:

  • disconnecting the external monitor (frequency related, discussed here)
  • disabling autoboost (discussed here) (Enabling/disabling default auto boosted clocks is not supported for GPU)
  • temperature, as shown by nvidia-settings while GPU is running, is about 42 C so i dont think its an issue (unless it spikes rapidly just before i get logged off?)

Could not attach log.gz file for some reason, attaching as txt.

Advice?..

Thank you.

nvidia-bug-report.txt (1.7 MB)

As you have already verified thermals, maybe try another PSU and check if some BIOS/UEFI updates are available for your mobo.

Cheers!

I thought it is a power supply issue before yesterday:
after restarting PC for several times for diagnostics after one of the restarts Nvidia GPU persisted without dropping off, even though PSU was not attached! After a while i did attach the PSU and could continue to use GPU for the rest of the day. It is only after i had to suspend laptop today and then awoken it did the GPU started dying again.
So i am back to GPU falling off the bus randomly within 5min period of starting the PC :|

And my UEFI is up to date. Thinking about upgrading my Ubuntu but i doubt that will help and is not something i can easily reverse.

I don’t think that software-wise anything other than the kernel and Nvidia driver may play a role in this issue, so maybe just try installing some newer available versions of these 2 and see if it helps.

I got warranty replacement of the GPU (together with the motherboard), problem has not re-occurred since.

2 Likes

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.