Driver lockup on Linux after Xid log message

Hi,

Recently I’ve been getting system/gpu lockups. Mostly happens while playing a video game, but it may have happened outside of gaming. I can’t quite remember.

It usually happens after a message like this: NVRM: Xid (PCI:0000:0a:00): 61, pid=1050, 0cec(3098) 00000000 00000000

For more background:
This workstation runs Debian Sid with KDE. Has a lot of differerent types of development work done on it. Ran Folding@Home when idle for a couple months (till just yesterday when i disabled it) along with a script that grabbed gpu utilization from nvidia-smi regularly to figure out when it was ok to enable FAH. Ran a Conky config that displayed gpu temperature from nvidia-smi. And occasionally did some gaming (eg: mostly No Man’s Sky, and Factorio. mostly factorio lately).

Here’s the log. I tried uploading it compressed as it was created, but the upload form refused it. so I ungziped it.
nvidia-bug-report.log (986.1 KB)

Once the lockup happens, anything that tries to access the gpu locks up for a long time. even nvidia-smi itself. nvidia-bug-report.sh itself took several minutes to finish. I am able to switch to a virtual terminal /once/ after a lock up. It takes a while for that to happen but it does work and I’m able to run things. If I attempt to switch back to X the lock up becomes pretty permanent/uncrecoverable. At least I haven’t had the patience to see if it’ll let me switch back again.

Please see this:
https://forums.developer.nvidia.com/t/random-xid-61-and-xorg-lock-up/79731/191