Hard GPU hangs with 440.44 drivers (GTX 1060 6GB)

To fix i2c issues I added an option for the nvidia kernel module (RMUseSwI2c=1) and I also upgraded to the latest drivers, so any one of these steps could have cause the issue or a combination of them.

Anyways, here’s how it happens:

I browse the web using Firefox and sometimes “GPU Process” and Xorg both start consuming 100% of CPU. I can kill “GPU Process” using kill -9 but the Xorg process is unkillable. At this point I cannot use the computer because nothing is being rendered on the screen. Pressing Ctrl + Alt + F1 does nothing as well.

The PC is still running, so I can use SysRQ to reboot.

My logs contain this:

/var/log/Xorg.0.log.old:

[  6024.224] (WW) NVIDIA: Wait for channel idle timed out.
[  6032.236] (EE) NVIDIA(GPU-0): WAIT (0, 8, 0x8000, 0x00002c1c, 0x00002c1c)

dmesg:

Dec 31 11:28:47 zen kernel: NVRM: Xid (PCI:0000:07:00): 61, pid=2954, 0a97(2b84) 00000000 00000000
Dec 31 11:29:23 zen kernel: nvidia-modeset: WARNING: GPU:0: Lost display notification (0:0x00000000); continuing.
Dec 31 11:29:28 zen kernel: nvidia-modeset: ERROR: GPU:0: Idling display engine timed out: 0x0000987d:0:0:966
Dec 31 11:29:30 zen kernel: nvidia-modeset: ERROR: GPU:0: Idling display engine timed out: 0x0000917e:0:0:971
Dec 31 11:29:32 zen kernel: nvidia-modeset: ERROR: GPU:0: Idling display engine timed out: 0x0000987d:0:0:966
Dec 31 11:29:34 zen kernel: nvidia-modeset: ERROR: GPU:0: Idling display engine timed out: 0x0000917e:0:0:971
Dec 31 11:29:36 zen kernel: nvidia-modeset: ERROR: GPU:0: Idling display engine timed out: 0x0000987d:0:0:966
Dec 31 11:29:38 zen kernel: nvidia-modeset: ERROR: GPU:0: Idling display engine timed out: 0x0000917e:0:0:971
Dec 31 11:29:40 zen kernel: nvidia-modeset: ERROR: GPU:0: Idling display engine timed out: 0x0000987d:0:0:966
Dec 31 11:29:42 zen kernel: nvidia-modeset: ERROR: GPU:0: Idling display engine timed out: 0x0000917e:0:0:971
Dec 31 11:29:44 zen kernel: nvidia-modeset: ERROR: GPU:0: Idling display engine timed out: 0x0000987d:0:0:966
Dec 31 11:29:46 zen kernel: nvidia-modeset: ERROR: GPU:0: Idling display engine timed out: 0x0000917e:0:0:971
Dec 31 11:29:48 zen kernel: nvidia-modeset: ERROR: GPU:0: Idling display engine timed out: 0x0000987d:0:0:452

I will now try running without this nvidia kernel module option.

After running without option nvidia RMUseSwI2c=1 for the past six days I can confirm that it’s indeed the culprit in my GPU hangs.

Unfortunately I cannot leave it on.