NVRM: GPU 0000:01:00.0: GPU has fallen off the bus (but probably not a power/psu problem)

I’m seeing the same games still failing with my new 3060 GPU (previously 1660 Ti). But it has changed to hanging the complete system instead of just crashing the game and maybe putting my desktop contents upside down.

The system continues running (just monitor contents freeze), so I could ssh into the box and capture a bug report: nvidia-bug-report.log.gz (329,1 KB)

It happens a lot with Batman: Arkham Night but it also happens in Elite Dangerous mostly while on planets. It doesn’t look like the GPU is hitting its power limit, OTOH I can leave it running a benchmark for hours at its power limit of 170 watts and no crash happens. The PSU is a Corsair CX550M, according to my research it should be more than enough.

Also, similar crashes happened with my previous 1660 Ti (just differently, or when the system froze, I wasn’t able to capture the crash log) in the same situations. So it’s more likely a driver bug.

If the system is left idle for a long time with nothing running on the desktop and monitors turned off, the system may also freeze - this is actually the same as with my previous GPU so I’m not sure if it is related to the driver. I’ll try to investigate but usually a few minutes after the crash (when I am still able to ssh into the box) the system will hard lock, and I have to use the reset button. Thus it’s a bit difficult to that crash “in the act”.