i am experiencing the oft-repeated here problem of 79, GPU has fallen off the bus on my NVIDIA RTX 2000 Ada Generation on my new Lenovo with Ubuntu 24.04.2 LTS.
I started off with 550 driver and the GPU would never start up at all and i was seeing the error on nvidia-smi by default:
Unable to determine the device handle for GPU0: 0000:01:00.0: Unknown Error
No devices were found
I then installed the nvidia-driver-570-open and nvidia-driver-570. The result with the former was the same, with the latter however the nvidia GPU did start for 3-5 mins, only for me to get logged off and see the same error after logging in again.
What did not work for me:
disconnecting the external monitor (frequency related, discussed here)
disabling autoboost (discussed here) (Enabling/disabling default auto boosted clocks is not supported for GPU)
temperature, as shown by nvidia-settings while GPU is running, is about 42 C so i dont think its an issue (unless it spikes rapidly just before i get logged off?)
Could not attach log.gz file for some reason, attaching as txt.
I thought it is a power supply issue before yesterday:
after restarting PC for several times for diagnostics after one of the restarts Nvidia GPU persisted without dropping off, even though PSU was not attached! After a while i did attach the PSU and could continue to use GPU for the rest of the day. It is only after i had to suspend laptop today and then awoken it did the GPU started dying again.
So i am back to GPU falling off the bus randomly within 5min period of starting the PC :|
I don’t think that software-wise anything other than the kernel and Nvidia driver may play a role in this issue, so maybe just try installing some newer available versions of these 2 and see if it helps.