Black screen when resuming systemctl-suspend, using nvidia-driver-470.57.02 with kernel 5.8.0-63-generic on GTX 970, xubuntu 20.04 LTS

I was having the same/similar issue(s). I followed these instructions to install using .run file and now my system is behaving normally.

Hi there, I’m currently running Kubuntu 18.04 LTS

I’m able to reproduce the problem.
EDIT: Trying to do a fresh reproduction failed, uncertain if information below can actually trigger the problem or if something changed meanwhile

EDIT2/3: I was able to reproduce it again, but it requires the PC to be suspended for a longer period of time (I think it was about 1 hour).

Hardware: GeForce GTX 970
Software: Ubuntu 18.04.6 LTS, Kernel 5.10.0-rc6, driver version: 470.63.01

Worked fine before upgrade from 455.

Can provide Stack Trace upon request. If necessary can also perform additional steps to obtain further information (for example use kgdb etc.)

I prefer not to share the logs with public downloads as they might or might not include login tokens for running programs.

I will try to upgrade the Linux kernel and see if the problem persists.

EDIT 4: I noticed an interesting thing. Version 465.27 is still present in the kernel log, so maybe this is a mere kernel/driver mismatch
nvidia-crash.txt (16.9 KB)

Yep, same issue here, suspending from the TTY session (no DE) resumes without screen turning on. Nvidia drivers seem to works fine: CUDA works, nvidia-smi shows correct info and stats. Monitors can be queried with ddcutil interrogate

Device Identifier Cross Reference Report

   /dev/i2c busno:     1
      EDID: ...2020008D  Mfg: NEC  Model: 90GX2          SN: 61111506GB
                         Product number: 26258, binary SN: 16843009
      XrandR output:      (null)
      DRM connector:      (null)
      UDEV name:          NVIDIA i2c adapter 0 at 1:00.0
      UDEV syspath:       /devices/pci0000:00/0000:00:01.0/0000:01:00.0/i2c-1/i2c-dev/i2c-1
      UDEV busno:         1
      sysfs drm path:     (null)
      sysfs drm I2C:      (null)
      sysfs drm busno:    Unknown
      ambiguous EDID:     false

   /dev/i2c busno:     5
      EDID: ...202000E8  Mfg: NEC  Model: LCD1990SXi     SN: 6Z112805YB
                         Product number: 26284, binary SN: 16843009
      XrandR output:      (null)
      DRM connector:      (null)
      UDEV name:          NVIDIA i2c adapter 8 at 1:00.0
      UDEV syspath:       /devices/pci0000:00/0000:00:01.0/0000:01:00.0/i2c-5/i2c-dev/i2c-5
      UDEV busno:         5
      sysfs drm path:     (null)
      sysfs drm I2C:      (null)
      sysfs drm busno:    Unknown
      ambiguous EDID:     false

but nothing is pushed to display, the led stays red.

inxi -Fx
System:    Host: grafZero Kernel: 5.14.9-arch2-1 x86_64 bits: 64 compiler: gcc v: 11.1.0 Console: tty pts/6 Distro: Arch Linux
Machine:   Type: Desktop System: ASUS product: All Series v: N/A serial: <superuser required>
           Mobo: ASUSTeK model: Z97-PRO GAMER v: Rev X.0x serial: <superuser required> BIOS: American Megatrends v: 0402
           date: 11/04/2014
CPU:       Info: Quad Core model: Intel Core i7-4790K bits: 64 type: MT MCP arch: Haswell rev: 3 cache: L2: 8 MiB
           flags: avx avx2 lm nx pae sse sse2 sse3 sse4_1 sse4_2 ssse3 vmx bogomips: 63987
           Speed: 4198 MHz min/max: 800/4400 MHz Core speeds (MHz): 1: 4198 2: 4198 3: 4198 4: 4198 5: 4198 6: 4198 7: 4198
           8: 4198
Graphics:  Device-1: NVIDIA GM204 [GeForce GTX 970] vendor: Gigabyte driver: nvidia v: 470.74 bus-ID: 01:00.0
           Display: server: X.org 1.20.13 driver: loaded: nvidia tty: 171x20
           Message: Advanced graphics data unavailable in console. Try -G --display

Same problem for Arch Linux.

$ pacman -Q | grep nvidia
lib32-nvidia-utils 470.74-1
nvidia 470.74-10
nvidia-settings 470.74-1
nvidia-utils 470.74-1

$ inxi -Fx
System: Host: arch Kernel: 5.14.14-arch1-1 x86_64 bits: 64 compiler: gcc v: 11.1.0 Desktop: KDE Plasma 5.23.1
Distro: Arch Linux
Machine: Type: Desktop System: Micro-Star product: MS-7C37 v: 1.0 serial:
Mobo: Micro-Star model: MPG X570 GAMING EDGE WIFI (MS-7C37) v: 1.0 serial:
UEFI: American Megatrends LLC. v: 1.D2 date: 12/30/2020
CPU: Info: 8-Core model: AMD Ryzen 7 3700X bits: 64 type: MT MCP arch: Zen 2 rev: 0 cache: L2: 4 MiB
flags: avx avx2 lm nx pae sse sse2 sse3 sse4_1 sse4_2 sse4a ssse3 svm bogomips: 115244
Speed: 3596 MHz min/max: 2200/3600 MHz boost: enabled Core speeds (MHz): 1: 3596 2: 2056 3: 2057 4: 2059 5: 2198
6: 2199 7: 2198 8: 1997 9: 2566 10: 2133 11: 2176 12: 2199 13: 2198 14: 2199 15: 3599 16: 2057
Graphics: Device-1: NVIDIA TU104 [GeForce RTX 2080 SUPER] vendor: Micro-Star MSI driver: nvidia v: 470.74 bus-ID: 2d:00.0
Display: x11 server: X.org 1.20.13 driver: loaded: nvidia resolution: <missing: xdpyinfo>
OpenGL: renderer: NVIDIA GeForce RTX 2080 SUPER/PCIe/SSE2 v: 4.6.0 NVIDIA 470.74 direct render: Yes

In fact, if I remember correctly, it existed on my PC from the very beginning (several years ago when I built my computer and installed Arch). I don’t remember when I last used suspension or hybernation.

UPD: I tried to install 460 drivers (https://archive.archlinux.org/packages/n/nvidia/nvidia-460.67-9 , etc), but the login manager didn’t start at all. Had to go back to 470.

Same thing here with Pop!_OS 20.04 (which is based on Ubuntu 20.04) and GTX-970. It’s clear from the near identical stack traces that lots of users face the same problem and the extent of the issue is being disguised by the number of different threads opened about it.
For the people at NVIDIA who can’t repro the issue, you simply didn’t wait long enough in the suspend state during testing. Many times, I was able to convince myself that my latest attempt at a fix was working, only to be disappointed later on.
Also, please note that the recommended power management config is being applied by default in many cases. For example, the systemd units are enabled by default, and NVreg_PreserveVideoMemoryAllocations=1 is set in /usr/lib/modprobe.d/nvidia-graphics-drivers.conf, so reiterating this stuff in other config files is a waste of time.

Same issue here on Arch Linux after having to downgrade from 495 to 470 because my GPU is now considered legacy.

Fortunately the workaround posted by humblebee in that other thread seems to fix the problem for me.

TLDR: disable the nvidia-suspend and nvidia-resume systemd services.

4 Likes

I am not able to duplicate issue so far on below configuration setup, I kept system in suspend mode for overnight and display came up successfully post resume operation.

ASUSTeK COMPUTER INC P9X79 + Intel(R) Core™ i7-3820 CPU @ 3.60GHz + Ubuntu 20.04.1 LTS + 5.11.0-27-generic + Driver 470.57.02 + NVIDIA GeForce GTX 980 + LG Electronics LG ULTRAGEAR + BenQ EL2870U

I will spend few more cycles to try for repro.

Can the reason for your successful testing be related to nvidia-persistenced? I’ve just discovered that my fresh install of 20.04 with driver 470 contains a systemd unit file for nvidia-persistenced with no installation information (meaning that it cannot be enabled) and a command line that explicitly disables persistence mode via --no-persistence-mode. I wonder how many people are under the impression that they successfully enabled persistence mode (following advice online) when in fact it was disabled at the next reboot.

Unfortunately, even with nvidia-persistenced correctly configured (i.e. running, started on boot, and persistence mode enabled), I still cannot resume from suspend. Each time there is the familiar stack trace mentioning nv_procfs_write_suspend.

Dec  1 07:35:39 imhotep kernel: [478040.532558] Call Trace:
Dec  1 07:35:39 imhotep kernel: [478040.532561]  nv_set_system_power_state+0x224/0x3c0 [nvidia]
Dec  1 07:35:39 imhotep kernel: [478040.532700]  nv_procfs_write_suspend+0xe7/0x140 [nvidia]
Dec  1 07:35:39 imhotep kernel: [478040.532851]  proc_reg_write+0x66/0x90
Dec  1 07:35:39 imhotep kernel: [478040.532854]  vfs_write+0xb9/0x250
Dec  1 07:35:39 imhotep kernel: [478040.532857]  ksys_write+0x67/0xe0
Dec  1 07:35:39 imhotep kernel: [478040.532859]  __x64_sys_write+0x1a/0x20
Dec  1 07:35:39 imhotep kernel: [478040.532861]  do_syscall_64+0x61/0xb0
Dec  1 07:35:39 imhotep kernel: [478040.532865]  ? exit_to_user_mode_prepare+0x3d/0x1c0
Dec  1 07:35:39 imhotep kernel: [478040.532869]  ? syscall_exit_to_user_mode+0x27/0x50
Dec  1 07:35:39 imhotep kernel: [478040.532870]  ? __x64_sys_newfstat+0x16/0x20
Dec  1 07:35:39 imhotep kernel: [478040.532872]  ? do_syscall_64+0x6e/0xb0
Dec  1 07:35:39 imhotep kernel: [478040.532874]  ? exc_page_fault+0x8f/0x170
Dec  1 07:35:39 imhotep kernel: [478040.532876]  ? asm_exc_page_fault+0x8/0x30
Dec  1 07:35:39 imhotep kernel: [478040.532878]  entry_SYSCALL_64_after_hwframe+0x44/0xae

Is it possible that the issue affects GTX 970 but not GTX 980 for some reason?

Appears to be fixed after apt-get install --purge nvidia-driver-495, which has installed version 495.44. An earlier version of 495 didn’t seem to fix it.

@womagrid
Do you mean to say that driver 495.44 fixed issue on your setup ?
If not, can you please attach nvidia bug report once again from repro state.

Yes, that’s what I meant. It seems a little difficult to believe because the previous 495 didn’t fix it and also I didn’t see anything relevant in the changelog, but still it appears to be true.

One thing I noticed is that the systemd services (nvidia-suspend.service etc) are now disabled, but they are still included in the distribution. It occurs to me that this might confuse users into believing that they are still required and should be enabled.

Hi All,
Since womagrid is not facing issue with driver 495.44, can others please verify the same and confirm test results.

If it helps I can confirm this issue on ASUSTeK COMPUTER INC sabertooth X79 + Intel(R) Core™ i7-3930k cpu @ 3.20GHz + Kde Neon 5.23.4 (based on latest Ubuntu LTS) + 5.11.0-41-generic kernel + Driver 470.86 + NVIDIA GeForce GTX 980 + Asus Proart pa329 + displayport

Probably related to systemd_logind_vtenter is not called by xserver 21.1.2 (#1271) · Issues · xorg / xserver · GitLab

@womagrid
Can you also please confirm that you do not see the issue even when the systemd services are enabled.

Also not resuming on Ubuntu 21.10

$ sudo lspci -v | less
NVIDIA Corporation GM204 [GeForce GTX 970] (rev a1)
$ nvidia-smi
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 495.44       Driver Version: 495.44       CUDA Version: 11.5     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:01:00.0  On |                  N/A |
| 36%   48C    P0    45W / 148W |    682MiB /  4040MiB |      1%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
$ sudo systemctl status nvidia-suspend
○ nvidia-suspend.service - NVIDIA system suspend actions
     Loaded: loaded (/lib/systemd/system/nvidia-suspend.service; enabled; vendor preset: enabled)
     Active: inactive (dead)
$ sudo systemctl status nvidia-resume
○ nvidia-resume.service - NVIDIA system resume actions
     Loaded: loaded (/lib/systemd/system/nvidia-resume.service; enabled; vendor preset: enabled)
     Active: inactive (dead)

Here the syslog:
syslog-nvidia-black-screen-suspend.txt (72.4 KB)

I didn’t really want to make the change in case it broke my setup again, but it seems that your 495.46 driver update has re-enabled the systemd units anyway. Thankfully, resume is still working.

Thanks womagrid for the positive feedback.

@user105657
Can you please check with driver 495.46 and confirm if it fixes issue on your setup.
If issue still persists, please share nvidia bug report from repro state and also confirm how are you doing suspend/resume on your system.