NVIDIA 515 - RTX 3060 - GPU has fallen off the bus

My RTX 3060 keeps freezing my desktop PC, roughly every 2 days.

The end of the kernel log is always similar:

[16852.358181] NVRM: GPU at PCI:0000:01:00: GPU-230b77a1-605f-1cf9-d9f9-f749c44bc2f8
[16852.358184] NVRM: Xid (PCI:0000:01:00): 79, pid='<unknown>', name=<unknown>, GPU has fallen off the bus.
[16852.358187] NVRM: GPU 0000:01:00.0: GPU has fallen off the bus.

Going off of similar threads to save some time:

  • The GPU is definitely not overheating ( 46 Celsius )
  • My power supply is a Corsair CX750M without anything too crazy or non standard pieces that would draw too much power
  • This happens at IDLE state (no games or graphically demanding apps open)
  • PCH DMI ASPM and PCI Express Native Power Management were disabled when it first happened; I also tried enabling them and this did not change the situation.

I have 2 monitors, one connected with DisplayPort, another connected with DVI. When the freeze happens, the monitor connected with DisplayPort turns black instantly, while the one connected with DVI keeps the last frame until I turn off the machine.
After the freeze happens, I can SSH in, that is how I got the nvidia-bug-report, which is also attached.
Also interestingly, right after the freeze happens, my GPUā€™s fans spin up really high and get loud until I turn off the machine.
After a hard reset, everything is back to normal until it happen again. (Roughly every 2 days as I said)

nvidia-bug-report.log.gz (243.4 KB)

Please check for a bios update, try reseating the gpu in its slot, check if it works in another system.

This looks similar to my problem, and Iā€™m also using two monitors most of the time, sometimes three:

Additionally, when pressing the reset button, I can see the PC slowly drawing a black box from top to bottom before really resetting the machine.

So far, disabling all ASPM functions in the BIOS seems to help. The PC didnā€™t freeze while idle, neither did it crash during light desktop workload with watching a video, and it didnā€™t freeze while gaming.

But letā€™s not count our chickens before they are hatched. Iā€™ll give it a few more days.

Seems solved on my side after a bios update, thank you for suggesting it. For anyone interested in this, in my case, I updated a ROG STRIX Z370-F GAMING from BIOS 2401 to 3004. Looks like it really does take a fresh bios to have stability with the card, I should have known better and sorry for the noise.

Ignore the comment above, unfortunately this is still reproducing with the latest BIOS.

Itā€™s stable for me at leastā€¦ Maybe thereā€™s still some PCIe power management going on in your system?

Well, all of the related options are disabled in the BIOS as well as the fact that I added pcie_aspm=off to my kernel command line. Unfortunately it still reproduces, even while IDLE.
I think the BIOS upgrade had some benefits still, as the reproduction rate is now down to once every 4-7 days, but still surely happening.
Also interestingly, whenever my GPU drops from the BUS, it starts to excessively vibrate until I shut the PC down. Really not sure whatā€™s going onā€¦

Iā€™m having the same issue on a System76 laptop. RTX 3060 on 515 driver. I was asked to send in the machine for RMA and they replaced the motherboard, but still this keeps happening. Iā€™ve now downgraded to the 470 driver and so far the problem hasnā€™t manifested itself again. Itā€™s only been a day, so weā€™ll see. Couldnā€™t test the 510 driver since Iā€™m running Pop OS and the 510 driver on the repos is not compatible with the latest kernel.

Hopefully this is definitively a driver issue and it gets fixed soon. Kind of annoying having a relatively new machine that wonā€™t work with anything but legacy drivers.

Update: still happening with the 470 driver. Desktop locks up, GPU has fallen off the bus message printed to the system log, and X server pegs CPU to 100%.
nvidia-bug-report.log.gz (258.5 KB)

Still reproducing with driver 520.56.06 on the latest BIOS. Also tried reseating the card and a different PCI slot, no difference. It is also unstable in another linux system I tried.

@lcatoni An Xid 79 is never a driver bug. Furthermore, on a notebook, this is almost always defective hardware. Please have it replaced by vendor again.

Not sure what changed but I couldnā€™t reproduce this for a good while now. The hardware is the same, the bios is the same, the only thing that might be different is just the regular kernel/driver updates that I install. In case it was rooted in the driver and silently fixedā€¦ thank you?

1 Like

I have the same problem exactly as you describe. Running Ubuntu 22.04 on Gigabyte X570 I Aorus Pro with RTX 3060. I believe I have tried all kernel 5.* and 500 series video drivers. First rate hardware but a third rate experience. I sent the card back to the store, but they sent it back to me saying they tested it working properly in Windows.

The fans start revving up after the graphical freeze. Both ssh and SysRq still work. Can you verify the problem hasnā€™t returned for you? What distro/kernel/driver are you running? Do you still use residual custom boot parameters? Can you share a cat /etc/default/grub | grep LINUX_DEFAULT?

Update: I have discovered that this is highly nvidia-driver minor version (0.x.x) dependent. After a minor update, I have these freezes very often. Sometimes within 3 minutes:

[  133.278291] NVRM: Xid (PCI:0000:09:00): 79, pid='<unknown>', name=<unknown>, GPU has fallen off the bus.
[  133.278294] NVRM: GPU 0000:09:00.0: GPU has fallen off the bus.
[  138.367502] nvidia-modeset: ERROR: GPU:0: Error while waiting for GPU progress: 0x0000c67e:6 2:0:4048:4040

Then I restore an image with the previous driver version using Timeshift, and I have zero freezes.

For over a year now, it seems that nvidia keeps fixing the issue and then introducing a regression later. Our hardware combinations are probably too uncommon for nvidia to test. It happens at least on 530 and 535 and 525 in the past.

Last night, an automated update from 535.98-0ubuntu0~gpu22.04.1 to 535.104.05-0ubuntu0.22.04.1 caused a super stable system to become super freezy due to the issue described in this thread. This specific update was informative because it contained only nvidia updates, and no other packages. I have been navigating between driver versions and using Timeshift a lot. At the moment 525.125.06 is safe to use with Linux 5.15.0-79 in my case.

Contrary to what @generix said, this is most definitely a driver bug.

I hope someone is still reading (@TomNVIDIA) because I donā€™t consider the RTX 3060 too ancient to support.

1 Like

I think I met the similar issue. My dual boot windows doesnā€™t have GPU issue, but my Arch halt frequently since last few months after the driver/kernel upgrade.

Iā€™m stuck with the GPU falling off the bus again. The pattern is the same. Youā€™re doing something insignificant like browsing the internet. It utilizes the GPU between 0 and 2%. Then, suddenly, scrolling becomes very laggy. Dragging windows around is very laggy. The UI runs at 5 FPS. GPU utilization is at 100%. No obvious reason. No game. No media playing.

Now a random pick between two things happens:

  1. After a few dozen seconds, the GPU goes back to 0%. If you have ā€œnvidia settingsā€ open on the PowerMizer page, you can see the Performance Level switch from 4 to 3 to 2 to 1 to 0. Or:
  2. The GPU has fallen off the bus and the computer freezes.

Something is broken. Perhaps itā€™s the firmware. I cannot find firmware upgrades for the PNY GeForce RTX 3060 12GB XLR8 Gaming REVEL EPIC-X RGB Single Fan Edition. It runs VBIOS 94.06.25.00.7E.

The same issue here.
Asus NVIDIA GeForce RTX 4060
Ubuntu 23.04 (had the same issue with the previous LTS)
nvidia-driver-535

Xid (PCI:0000:26:00): 79, pid=ā€˜ā€™, name=, GPU has fallen off the bus.
NVRM: GPU 0000:26:00.0: GPU has fallen off the bus.

The freeze is immediate and happens only on the desktop when idle, never happened under load (gaming).
Cursor stops and I can still use ssh to log in.

NVIDIA please do somethingā€¦

Maybe itā€™s too early to celebrate, but after settings this: "The NVIDIA GPU remains ā€œon the busā€ if the NVIDIA Settings PowerMizer mode is set to ā€œMaximum Performanceā€. - no GPU fallen off the bus so far.

Jyka - is your change to the powermizer mode still working? Where exactly did you find this setting, can you provide some instructions? Iā€™d like to try this also.

This used to freeze at least once a day. Three weeks without freezing now. Here is the setting: https://i.imgur.com/vas0kjF.png