Nvidia driver crashes when external HDMI 2.1 Monitor is turned off

I just got an LG 48CX and am using HDMI 2.1 to connect to it at 4k@120hz. It is all working, except for this one issue. Whenever I turn off the monitor, then turn it back on again, the nvidia driver crashes and all signal is lost. I have to reboot my machine to fix it.

Unfortunately nvidia-bug-report.sh is not working for me, so I will try to provide as much info as I can manually. If you need the bug report I’ll spend some time trying to get that working later. But I need to get work done right now :).

kernel logs:

Nov 01 11:37:29 thor kernel: nvidia-modeset: ERROR: GPU:0: HDMI FRL link training failed.
Nov 01 11:37:32 thor kernel: nvidia-modeset: WARNING: GPU:0: Lost display notification (0:0x00000000); continuing.
Nov 01 11:37:56 thor kernel: nvidia-modeset: ERROR: GPU:0: Idling display engine timed out: 0x0000c67d:0:0:1098
Nov 01 11:38:08 thor kernel: nvidia-modeset: ERROR: GPU:0: Idling display engine timed out: 0x0000c67e:0:0:1114
Nov 01 11:38:28 thor /nix/store/d0gfx95svzaknz635sqkxhpzfb305pxg-gdm-3.34.1/libexec/gdm-x-session[2042]: (WW) NVIDIA: Wait for channel idle timed out.

nvidia-smi:

Sun Nov  1 11:57:18 2020       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 455.28       Driver Version: 455.28       CUDA Version: 11.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  GeForce RTX 3080    Off  | 00000000:09:00.0  On |                  N/A |
|  0%   49C    P8    35W / 340W |   1216MiB / 10014MiB |      6%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

uname -a

Linux thor 5.4.72 #1-NixOS SMP Sat Oct 17 08:11:24 UTC 2020 x86_64 GNU/Linux

Any suggestions for workarounds would be appreciated. I’ve tried a few things without any luck, including trying to toggle to a TTY, using nomodeset as a kernel parameter, and unplugging / plugging the HDMI cable back in. However nothing helps!

I’m using gnome 3 which uses wayland I believe.

More experimentation shows the steps needed for this to happen:

  1. Screen blanks after 5 minutes (dpms off)
  2. TV turns off on its own after another 5 minutes
  3. I wake up the screen (dpms on) by pressing a key on the keyboard <- nvidia driver crashes with the above messages

A workaround is to make sure the TV is turned on first before waking up the signal output. Seems like it crashes because it can’t negotiate the HDMI 2.1 connection when the TV is off after blanking.

Bump - any ideas or workarounds for this? It’s incredibly aggravating! On Windows all works correctly. Seems like a bug in the nvidia linux driver.

Thanks for reporting this, and sorry for the slow reply. I filed internal bug 3198439 to look into this.

It would probably be helpful to get a bug report log. What was going wrong with nvidia-bug-report.sh? If it’s hanging, you can try running it with the --safe-mode parameter to reduce the amount of data it tries to log.

@aplattner, thanks for the reply! I was losing hope. :)

nvidia-bug-report.sh unfortunately doesn’t work on NixOS which I’m currently using. I think it makes some assumptions about paths which don’t exist on Nix. So I’m going to have to dig into it and see if I can patch it to get it to work.

Since I first posted this, I made some more discoveries about the issue.

  1. I am actually using X11, NOT wayland, as gnome disables wayland for nvidia by default.
  2. The crash occurs when I wake up my computer from DPMS off to DPMS on, and the TV is OFF. (Note I am not sleeping the computer - it simply does DPMS off after a set period of time).
  3. As a workaround, if I make sure to turn on the TV first, then wake up the computer, it all works. It only crashes if the TV is off.

I have also seen the GpuWatchdog segfault:

Nov 30 17:35:29 thor kernel: Code: 89 de e8 cd 2f 63 ff 80 7d cf 00 79 09 48 8b 7d b8 e8 0e 1f 5f fe 41 8b 84 24 e0 00 00 00 89 45 b8 48 8d 7d b8 e8 9a 2e cf fb <c7> 04 25 00 00 00 00 37 13>
Nov 30 17:35:29 thor kernel: GpuWatchdog[19369]: segfault at 0 ip 00007f6d36655fa6 sp 00007f6d2e159030 error 6 in libcef.so[7f6d32348000+75cd000]

A good thing to note here is that this is happening and any hdmi 2.1 display on 30series cards. I have a 3090 and with the same issue but mine won’t post from a cold boot and I have to manually restart it to get it to post. It also blue screens and crashes the pc mid game but only if I’m using an hdmi 2.1 display. I can use any other display and it games just fine with no other issues. All drivers are updated. I have tested intensive raytracing benches simulating 8k and it runs just fine when using display port. If I’m running hdmi 2.1 it crashes no matter what else I try. There are a lot of other people complaining about this and a bunch of threads online dedicated to this issue. So far I haven’t found anyone from nvidia to step up and help solve the issue. It’s not just lg cx TVs but Samsung as well and any other with hdmi 2.1 (those oled displays don’t have display port options)

I’m pretty sure this topic is the same issue as this one I just created but mine has nvidia-bug-reports, logs, etc., and makes it clear that the issue is only about turning HDMI 2.1 LG OLED TV’s on, not about turning off then on again.

Same happening here, 2080 Ti card, with LG C1 TV and normal 4K 60hz resolution.

Forgot to mention, my probably same issue described with full bug reports and stack traces here is happening on my 3070

For me it happens on 3080Ti with LG C1, I’ve added more infor to whilom’s thread in Turning on LG OLED55BX crashes nvidia 460.67 driver - #3 by Martin.Jansa