[Regression 460 series] Black screen on boot: nvidia-modeset: ERROR: GPU:0: Failed to allocate display engine core DMA push buffer

I’ve just added more memory to my system, sadly it didn’t help. But before suspend, I did have more than double my gpu memory as free space on system memory.
I’m at a loss to the cause of this issue.

This is probably obvious, but the issue disappears when Prime Intel (Power Saving Mode) is selected (normally I run the ‘Nvidia on-demand mode’). In the sys log below you can find several reboots:

  1. Prime Intel boot (line 2) here suspend proceeds perfectly fine. This apparently does call the nvidia-suspend.service (line 3990) and nvidia-resume.service (line 4154) even though it hasn’t loaded the nvidia-drivers. Then I made the first bug report after which then I changed to Prime Nvidia (Performance mode) see line 9216.
  2. Prime nvidia boot (line 10087) does show the same behaviour as reported before. Suspend isn’t working properly. As expected, the issue is not the ‘on-demand’ setting itself, but simlpy the nvidia-driver being loaded.

Bugreport Prime Intel: nvidia-bug-report-prime-Intel.log.gz (169.0 KB)
Pugreport Prime Nvidia: nvidia-bug-report-prime-nvidia.log.gz (400.5 KB)
Journal: journal.txt (2.1 MB)

I’ve now also reproduced with 470.42.01, below is a new bug report:

In terms of behaviour, there was a small change: Now my screen turns black immediately upon loading the module, no flickering of the backlight as observed with 460/465 versions.

There’s now also more in the kernel log (of course also contained in the bug report):

Jul 14 19:24:00 localhost kernel: [   33.819880] nvidia-modeset: ERROR: GPU:0: Display engine push buffer channel allocation failed: 0x65 (Call timed out [NV_ERR_TIMEOUT])
Jul 14 19:24:00 localhost kernel: [   33.820328] nvidia-modeset: ERROR: GPU:0: Failed to allocate display engine core DMA push buffer
...
Jul 14 19:24:04 localhost kernel: [   38.126480] BUG: kernel NULL pointer dereference, address: 0000000000000070
Jul 14 19:24:04 localhost kernel: [   38.126483] #PF: supervisor read access in kernel mode
Jul 14 19:24:04 localhost kernel: [   38.126484] #PF: error_code(0x0000) - not-present page
Jul 14 19:24:04 localhost kernel: [   38.126485] PGD 0 P4D 0 
Jul 14 19:24:04 localhost kernel: [   38.126488] Oops: 0000 [#1] SMP PTI
Jul 14 19:24:04 localhost kernel: [   38.126490] CPU: 3 PID: 11479 Comm: X Tainted: P           O      5.9.11-gentoo #1
Jul 14 19:24:04 localhost kernel: [   38.126491] Hardware name: Alienware Alienware 17/04WT2G, BIOS A17 07/22/2019
Jul 14 19:24:04 localhost kernel: [   38.126503] RIP: 0010:_nv002520kms+0x18/0x70 [nvidia_modeset]
Jul 14 19:24:04 localhost kernel: [   38.126504] Code: 24 1f 01 eb b2 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 41 54 55 49 89 fc 53 89 d5 41 b8 04 00 00 00 ba 02 01 02 00 48 83 ec 10 <8b> 46 70 8b 3d df 69 0c 00 48 8d 4c 24 0c 89 ee 89 44 24 0c e8 cf
Jul 14 19:24:04 localhost kernel: [   38.126506] RSP: 0018:ffffb032015fbd08 EFLAGS: 00010282
Jul 14 19:24:04 localhost kernel: [   38.126507] RAX: 0000000000000000 RBX: ffff8cf9ee73a008 RCX: 0000000000000082
Jul 14 19:24:04 localhost kernel: [   38.126508] RDX: 0000000000020102 RSI: 0000000000000000 RDI: ffff8cf9ee73a008
Jul 14 19:24:04 localhost kernel: [   38.126509] RBP: 0000000000010009 R08: 0000000000000004 R09: 0000000000000000
Jul 14 19:24:04 localhost kernel: [   38.126510] R10: ffffb032015fbc78 R11: ffff8cf9b378b000 R12: ffff8cf9ee73a008
Jul 14 19:24:04 localhost kernel: [   38.126511] R13: ffff8cf9ee73a0a0 R14: 0000000000000fff R15: 0000000000010008
Jul 14 19:24:04 localhost kernel: [   38.126512] FS:  00007f36e09f58c0(0000) GS:ffff8cf9fecc0000(0000) knlGS:0000000000000000
Jul 14 19:24:04 localhost kernel: [   38.126513] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jul 14 19:24:04 localhost kernel: [   38.126514] CR2: 0000000000000070 CR3: 000000081063c001 CR4: 00000000001706e0
Jul 14 19:24:04 localhost kernel: [   38.126515] Call Trace:
Jul 14 19:24:04 localhost kernel: [   38.126524]  ? _nv002519kms+0xb1/0x150 [nvidia_modeset]
Jul 14 19:24:04 localhost kernel: [   38.126532]  ? _nv002298kms+0x489/0x670 [nvidia_modeset]
Jul 14 19:24:04 localhost kernel: [   38.126534]  ? __kmalloc+0x165/0x18c
Jul 14 19:24:04 localhost kernel: [   38.126536]  ? __check_heap_object+0x52/0xff
Jul 14 19:24:04 localhost kernel: [   38.126538]  ? __check_object_size+0x103/0x192
Jul 14 19:24:04 localhost kernel: [   38.126543]  ? nv_kthread_q_stop+0x2246/0x2c76 [nvidia_modeset]
Jul 14 19:24:04 localhost kernel: [   38.126548]  ? nv_kthread_q_stop+0x227a/0x2c76 [nvidia_modeset]
Jul 14 19:24:04 localhost kernel: [   38.126553]  ? nvKmsIoctl+0x96/0x1d0 [nvidia_modeset]
Jul 14 19:24:04 localhost kernel: [   38.126558]  ? nvkms_ioctl_common+0x41/0x10a [nvidia_modeset]
Jul 14 19:24:04 localhost kernel: [   38.126563]  ? nvkms_ioctl_common+0xdb/0x10a [nvidia_modeset]
Jul 14 19:24:04 localhost kernel: [   38.126649]  ? nvidia_frontend_unlocked_ioctl+0x14/0x17 [nvidia]
Jul 14 19:24:04 localhost kernel: [   38.126652]  ? vfs_ioctl+0x19/0x26
Jul 14 19:24:04 localhost kernel: [   38.126653]  ? __do_sys_ioctl+0x63/0x86
Jul 14 19:24:04 localhost kernel: [   38.126656]  ? do_syscall_64+0x5d/0x6a
Jul 14 19:24:04 localhost kernel: [   38.126659]  ? entry_SYSCALL_64_after_hwframe+0x44/0xa9
Jul 14 19:24:04 localhost kernel: [   38.126660] Modules linked in: ccm cmac algif_hash algif_skcipher af_alg bnep zram uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videobuf2_common videodev mc btusb btrtl btbcm btintel bluetooth intel_rapl_msr iwlmvm mac80211 iwlwifi intel_rapl_common intel_powerclamp coretemp vhba(O) kvm_intel snd_hda_codec_realtek snd_hda_codec_generic snd_hda_codec_hdmi kvm dell_wmi cfg80211 snd_hda_intel dell_smbios snd_intel_dspcfg pcspkr snd_hda_codec dell_wmi_descriptor alx snd_hda_core mdio dell_smo8800 dell_rbtn nvidia_drm(PO) nvidia_modeset(PO) crct10dif_pclmul crc32_pclmul ghash_clmulni_intel nvidia(PO) sdhci_pci aesni_intel glue_helper iosf_mbi crypto_simd cqhci iTCO_wdt intel_pmc_bxt sdhci rtsx_pci_sdmmc mmc_core
Jul 14 19:24:04 localhost kernel: [   38.126673] CR2: 0000000000000070
Jul 14 19:24:04 localhost kernel: [   38.126674] ---[ end trace c7c301411c6c99f7 ]---
Jul 14 19:24:04 localhost kernel: [   38.159246] RIP: 0010:_nv002520kms+0x18/0x70 [nvidia_modeset]
Jul 14 19:24:04 localhost kernel: [   38.159248] Code: 24 1f 01 eb b2 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 41 54 55 49 89 fc 53 89 d5 41 b8 04 00 00 00 ba 02 01 02 00 48 83 ec 10 <8b> 46 70 8b 3d df 69 0c 00 48 8d 4c 24 0c 89 ee 89 44
 24 0c e8 cf
Jul 14 19:24:04 localhost kernel: [   38.159249] RSP: 0018:ffffb032015fbd08 EFLAGS: 00010282
Jul 14 19:24:04 localhost kernel: [   38.159251] RAX: 0000000000000000 RBX: ffff8cf9ee73a008 RCX: 0000000000000082
Jul 14 19:24:04 localhost kernel: [   38.159251] RDX: 0000000000020102 RSI: 0000000000000000 RDI: ffff8cf9ee73a008
Jul 14 19:24:04 localhost kernel: [   38.159252] RBP: 0000000000010009 R08: 0000000000000004 R09: 0000000000000000
Jul 14 19:24:04 localhost kernel: [   38.159253] R10: ffffb032015fbc78 R11: ffff8cf9b378b000 R12: ffff8cf9ee73a008
Jul 14 19:24:04 localhost kernel: [   38.159254] R13: ffff8cf9ee73a0a0 R14: 0000000000000fff R15: 0000000000010008
Jul 14 19:24:04 localhost kernel: [   38.159255] FS:  00007f36e09f58c0(0000) GS:ffff8cf9fecc0000(0000) knlGS:0000000000000000
Jul 14 19:24:04 localhost kernel: [   38.159256] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jul 14 19:24:04 localhost kernel: [   38.159256] CR2: 0000000000000070 CR3: 000000081063c001 CR4: 00000000001706e0

Any ideas welcome. There’s currently no maintained functional driver for these cards, unless I go back to the legacy drivers.

nvidia-bug-report.log.gz (1.2 MB)

I can reproduce this in the new driver as well, in my case 470.57.02.

  journalctl -b -1:
# Some kind of traceback related to the Nvidia device
jul 25 19:44:41 bram-Zbook kernel: WARNING: CPU: 2 PID: 15529 at /var/lib/dkms/nvidia/470.57.02/build/nvidia/nv.c:4175 nv_set_system_power_state+0x2c1/0x3c0 [nvidia]
jul 25 19:44:41 bram-Zbook kernel: Modules linked in: thunderbolt rfcomm ccm xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp ip6table_mangle ip6table_nat iptable_mangle iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c nf_tables nfnetlink ip6table_filter ip6_tables iptable_filter bpfilter bridge stp llc cmac algif_hash algif_skcipher af_alg bnep nls_iso8859_1 nvidia_uvm(O) nvidia_drm(PO) nvidia_modeset(PO) snd_hda_codec_conexant snd_hda_codec_generic ledtrig_audio uvcvideo btusb btrtl videobuf2_vmalloc btbcm videobuf2_memops videobuf2_v4l2 btintel videobuf2_common bluetooth videodev mc ecdh_generic ecc mei_hdcp intel_rapl_msr x86_pkg_temp_thermal intel_powerclamp kvm_intel kvm crct10dif_pclmul ghash_clmulni_intel aesni_intel crypto_simd cryptd glue_helper rapl intel_cstate snd_hda_intel snd_intel_dspcfg snd_hda_codec snd_hda_core snd_hwdep snd_pcm iwlmvm snd_seq_midi snd_seq_midi_event snd_rawmidi mac80211 nvidia(PO) input_leds snd_seq hp_wmi snd_seq_device
jul 25 19:44:41 bram-Zbook kernel:  serio_raw i915 efi_pstore intel_wmi_thunderbolt sparse_keymap libarc4 wmi_bmof snd_timer iwlwifi mxm_wmi drm_kms_helper ee1004 cfg80211 snd cec soundcore rc_core processor_thermal_device joydev i2c_algo_bit intel_rapl_common fb_sys_fops syscopyarea intel_soc_dts_iosf mei_me sysfillrect sysimgblt mei intel_pch_thermal mac_hid int3403_thermal hp_accel int340x_thermal_zone acpi_pad int3400_thermal tpm_infineon lis3lv02d hp_wireless acpi_thermal_rel sch_fq_codel coretemp parport_pc ppdev lp parport drm ip_tables x_tables autofs4 hid_alps hid_generic rtsx_pci_sdmmc nvme ahci crc32_pclmul psmouse i2c_i801 intel_lpss_pci e1000e libahci rtsx_pci i2c_smbus i2c_hid nvme_core intel_lpss idma64 xhci_pci virt_dma xhci_pci_renesas hid video pinctrl_sunrisepoint wmi pinctrl_intel
jul 25 19:44:41 bram-Zbook kernel: CPU: 2 PID: 15529 Comm: nvidia-sleep.sh Tainted: P        W  O      5.8.0-63-generic #71~20.04.1-Ubuntu
jul 25 19:44:41 bram-Zbook kernel: Hardware name: HP HP ZBook Studio G3/80D4, BIOS N82 Ver. 01.52 10/28/2020
jul 25 19:44:41 bram-Zbook kernel: RIP: 0010:nv_set_system_power_state+0x2c1/0x3c0 [nvidia]
jul 25 19:44:41 bram-Zbook kernel: Code: 00 4d 85 e4 0f 84 4a ff ff ff 41 83 fd 02 74 e9 49 8b bc 24 88 02 00 00 be 02 00 00 00 e8 57 d0 ff ff 85 c0 74 d3 0f 0b eb cf <0f> 0b e9 64 ff ff ff 48 c7 c7 50 ea a1 c2 e8 0c 10 a9 dd e8 47 1b
jul 25 19:44:41 bram-Zbook kernel: RSP: 0018:ffffacebc4b5fe20 EFLAGS: 00010206
jul 25 19:44:41 bram-Zbook kernel: RAX: 0000000000000003 RBX: 0000000000000002 RCX: 0000000080020001
jul 25 19:44:41 bram-Zbook kernel: RDX: 0000000080020002 RSI: 0000000000000001 RDI: ffff9e41e4992bc0
jul 25 19:44:41 bram-Zbook kernel: RBP: ffffacebc4b5fe50 R08: 0000000000000000 R09: ffffffffc0a6aa01
jul 25 19:44:41 bram-Zbook kernel: R10: ffff9e4164c9b000 R11: 0000000000000001 R12: ffff9e41e8ab4000
jul 25 19:44:41 bram-Zbook kernel: R13: 0000000000000000 R14: ffffacebc4b5fef0 R15: 00005594fb4b6540
jul 25 19:44:41 bram-Zbook kernel: FS:  00007f08062f3740(0000) GS:ffff9e41ef680000(0000) knlGS:0000000000000000
jul 25 19:44:41 bram-Zbook kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
jul 25 19:44:41 bram-Zbook kernel: CR2: 00007fe04509f290 CR3: 00000004a53c0006 CR4: 00000000003606e0
jul 25 19:44:41 bram-Zbook kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
jul 25 19:44:41 bram-Zbook kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
jul 25 19:44:41 bram-Zbook kernel: Call Trace:
jul 25 19:44:41 bram-Zbook kernel:  nv_procfs_write_suspend+0xe7/0x140 [nvidia]
jul 25 19:44:41 bram-Zbook kernel:  proc_reg_write+0x66/0x90
jul 25 19:44:41 bram-Zbook kernel:  vfs_write+0xc9/0x200
jul 25 19:44:41 bram-Zbook kernel:  ksys_write+0x67/0xe0
jul 25 19:44:41 bram-Zbook kernel:  __x64_sys_write+0x1a/0x20
jul 25 19:44:41 bram-Zbook kernel:  do_syscall_64+0x49/0xc0
jul 25 19:44:41 bram-Zbook kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xa9
jul 25 19:44:41 bram-Zbook kernel: RIP: 0033:0x7f08064071e7
jul 25 19:44:41 bram-Zbook kernel: Code: 64 89 02 48 c7 c0 ff ff ff ff eb bb 0f 1f 80 00 00 00 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24
jul 25 19:44:41 bram-Zbook kernel: RSP: 002b:00007ffe20e8aa78 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
jul 25 19:44:41 bram-Zbook kernel: RAX: ffffffffffffffda RBX: 0000000000000007 RCX: 00007f08064071e7
jul 25 19:44:41 bram-Zbook kernel: RDX: 0000000000000007 RSI: 00005594fb4b6540 RDI: 0000000000000001
jul 25 19:44:41 bram-Zbook kernel: RBP: 00005594fb4b6540 R08: 000000000000000a R09: 0000000000000006
jul 25 19:44:41 bram-Zbook kernel: R10: 00005594fb2c0017 R11: 0000000000000246 R12: 0000000000000007
jul 25 19:44:41 bram-Zbook kernel: R13: 00007f08064e26a0 R14: 00007f08064e34a0 R15: 00007f08064e28a0
jul 25 19:44:41 bram-Zbook kernel: ---[ end trace a0e25c3914b46a5a ]---

# End of traceback, the log continues regarding a PCIe device that I don't expect to be the GPU, as it reports a 2.5 GT/s PCIe x4 speed. (NVIDIA X Server Settings reports a x16 link) 
# Then it continues about Nvidia devices.

jul 25 19:44:45 bram-Zbook kernel: nvidia-modeset: ERROR: GPU:0: Display engine push buffer channel allocation failed: 0x65 (Call timed out [NV_ERR_TIMEOUT])
jul 25 19:44:45 bram-Zbook kernel: nvidia-modeset: ERROR: GPU:0: Failed to allocate display engine core DMA push buffer
jul 25 19:44:45 bram-Zbook rtkit-daemon[1269]: Supervising 7 threads of 3 processes of 2 users.
jul 25 19:44:45 bram-Zbook rtkit-daemon[1269]: Successfully made thread 16429 of process 2667 owned by '1000' RT at priority 5.
jul 25 19:44:45 bram-Zbook rtkit-daemon[1269]: Supervising 8 threads of 3 processes of 2 users.
jul 25 19:44:45 bram-Zbook rtkit-daemon[1269]: Supervising 8 threads of 3 processes of 2 users.
jul 25 19:44:45 bram-Zbook rtkit-daemon[1269]: Successfully made thread 16430 of process 2667 owned by '1000' RT at priority 5.
jul 25 19:44:45 bram-Zbook rtkit-daemon[1269]: Supervising 9 threads of 3 processes of 2 users.
jul 25 19:44:45 bram-Zbook kernel: usb 3-1.4.2: new full-speed USB device number 11 using xhci_hcd
jul 25 19:44:45 bram-Zbook kernel: usb 3-1.3.3: New USB device found, idVendor=0c45, idProduct=6341, bcdDevice= 0.00
jul 25 19:44:45 bram-Zbook kernel: usb 3-1.3.3: New USB device strings: Mfr=2, Product=1, SerialNumber=0
jul 25 19:44:45 bram-Zbook kernel: usb 3-1.3.3: Product: USB 2.0 Camera
jul 25 19:44:45 bram-Zbook kernel: usb 3-1.3.3: Manufacturer: Sonix Technology Co., Ltd.
jul 25 19:44:47 bram-Zbook NetworkManager[1042]: <info>  [1627235087.3616] device (eth0): interface index 7 renamed iface from 'eth0' to 'enp63s0'
jul 25 19:44:47 bram-Zbook systemd-udevd[16025]: ethtool: autonegotiation is unset or enabled, the speed and duplex are not writable.
jul 25 19:44:49 bram-Zbook kernel: nvidia-modeset: ERROR: GPU:0: Display engine push buffer channel allocation failed: 0x65 (Call timed out [NV_ERR_TIMEOUT])
jul 25 19:44:49 bram-Zbook kernel: nvidia-modeset: ERROR: GPU:0: Failed to allocate display engine core DMA push buffer
jul 25 19:44:49 bram-Zbook /usr/lib/gdm3/gdm-x-session[1261]: (II) config/udev: Adding input device HP HP Dock Audio (/dev/input/event14)
jul 25 19:44:49 bram-Zbook /usr/lib/gdm3/gdm-x-session[1261]: (**) HP HP Dock Audio: Applying InputClass "libinput keyboard catchall"
jul 25 19:44:49 bram-Zbook /usr/lib/gdm3/gdm-x-session[1261]: (II) Using input driver 'libinput' for 'HP HP Dock Audio'
jul 25 19:44:49 bram-Zbook systemd[1]: systemd-suspend.service: Succeeded.
jul 25 19:44:49 bram-Zbook systemd[1]: Finished Suspend.
jul 25 19:44:49 bram-Zbook systemd[1]: Stopped target Sleep.
jul 25 19:44:49 bram-Zbook systemd[1]: Reached target Suspend.
jul 25 19:44:49 bram-Zbook systemd[1]: Starting NVIDIA system resume actions...
jul 25 19:44:49 bram-Zbook systemd[1]: Stopped target Suspend.
jul 25 19:44:49 bram-Zbook suspend[16434]: nvidia-resume.service
jul 25 19:44:49 bram-Zbook logger[16434]: <13>Jul 25 19:44:49 suspend: nvidia-resume.service
jul 25 19:44:49 bram-Zbook systemd[1]: nvidia-resume.service: Succeeded.
jul 25 19:44:49 bram-Zbook systemd[1]: Finished NVIDIA system resume actions.

The non cropped file is appended (journalctl -b -1.log.gz), as is the NVidia bug report (nvidia-bug-report.log.gz).

So I distro-hopped from Elementary OS (Ubuntu variant) to Fedora and the issues disappeared for me. Decided to retry Elementary OS and the issue came back.

To finally fix this on my machine I uninstalled the nvidia driver deb packges and reinstall them using the NVIDIA-*.run install file instead and it worked. Now I’m running Elementary OS using nvidia without sleep-resume crashing.

Feels like whomever is packaging the nvidia-drivers for Ubuntu is doing something which doesn’t play nice with my laptop.

This is an interesting route to pursue, I to run on drivers installed via a PPA for Ubuntu variants. I’m not exactly clear on where to submit the bug report. Originally I expected that I had to report it to the organization that made the ppa:graphics-drivers/ppa but only 3 bugs have been made there in the past. Besides I get the issues also from the drivers that are packaged by the Ubuntu team itself as well. That is I get the problems also when using nvidia-graphics-drivers-470 maintained by the Ubuntu Developers (see for the 460 and 465). These have way more bug reports, and the team seems to respond.

@chris.bainbridge made a bug report about it already, but I don’t see any mention in the 465 and 470 sites. Should I make one?

We are tracking issue internally with bug number 3358939 .
We are currently trying to duplicate issue locally.
Shall keep everyone updated on it.

3 Likes

Hi i have same issue Hp Zbook G5 ,yesterday just update to Nividia-driver-470 and got error any help please

The same problem with my Asus TUF FX705G running Ubuntu 20.04 with v470 Nvidia drivers.

Tried to switch to other driver versions in “Software and Updates” → “Additional drivers” - didn’t help (tried all available versions there). Also tried to install latest Nvidia driver from nvidea.com - didn’t help as well.

The only solution that worked for me, was to install nvidia-driver-450-server driver, but not from “Software and Updates” GUI, but from terminal, like that:

sudo apt purge nvidia-*
sudo apt install nvidia-driver-450-server

Any updates on this?

Happens to me too, driver 460 and 470, ubuntu 21.04, kernel 5.11.0-34-generic #36-Ubuntu SMP.

Changing driver to 460 and 450 with “Software and updates” didn’t solve it.

Needed to change it in the command line as @alex21975, but I also removed unused nvidia packages before reinstalling:

sudo apt purge nvidia-*
sudo apt autoremove
sudo apt install nvidia-driver-450-server

reboot

Is there any update on this ?

I have this issue with my Razer Blade 15 Advanced (NVIDIA GTX 1060 Max Q) running Ubuntu 20.04 LTS (470.57.02)

Can we get an update?

I’m also on Acer Aspire 7 with Nvidia GeForce GTX 1050, but with ubuntu 21.10.
I was able to solve it, as the way @alex21975 and @hdaniel mentioned, but with nvidia-driver-460-server:

sudo apt purge nvidia-*
sudo apt autoremove
sudo apt install nvidia-driver-460-server

reboot

Nowt the computer will not stuck on boot after suspend
but if I do
sudo service nvidia-suspend status
it will show
“Unit nvidia-suspend.service could not be found.”

Solving it with
NVIDIA Suspend fix
still shows me
"
nvidia-suspend.service - NVIDIA system suspend actions
Loaded: loaded (/etc/systemd/system/nvidia-suspend.service; enabled; vendor preset: enabled)
Active: inactive (dead)
"
and the logs shows

kernel: snd_hda_codec_hdmi hdaudioC1D0: Unable to sync register 0x7f0800. -5
kernel: snd_hda_intel 0000:01:00.1: can't change power state from D3cold to D0 (config space inaccessible)

I’m suspecting this thread contains two different issues.
All users that, when running

sudo lspci -xxx -d 10de:*

get the audio device’s pci config space with all 0xFF, please try if this helps:

if not already mentioned in this thread.

1 Like

Method above to remove the nvidia audio device works for me.

Tried with 495 driver. Issue still persists unfortunately.

I found a kind of workaround. When the screen wakes from sleep (but goes black), use CTRL + ALT + F2 to switch to a terminal (terminal shows on the screen in a few seconds) and CTRL + ALT + F1 or F7 (depending on the system) to switch back to the graphical session. The screen will then work normally again (until the next time it goes to sleep).

The 495 driver seems to work, although this might be because I’ve tinkered around a lot when trying to fix previous driver versions. But I guess that it is worth trying the update. I do include some of my settings below, as those might be useful if the 495 driver is not working for you.

Getting the conformation on the installed driver.

user@device:~$ nvidia-smi
Tue Nov  9 09:58:24 2021       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 495.44       Driver Version: 495.44       CUDA Version: 11.5     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Quadro M1000M       Off  | 00000000:01:00.0  On |                  N/A |
| N/A   52C    P8    N/A /  N/A |    259MiB /  4043MiB |     22%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      1320      G   /usr/lib/xorg/Xorg                158MiB |
|    0   N/A  N/A      2710      G   /usr/lib/xorg/Xorg                 97MiB |
+-----------------------------------------------------------------------------+

Getting some of my settings listed (I had changed some of these while trying to change the memory handling at hibernation), see above.

user@device:~$ cat /proc/driver/nvidia/params
ResmanDebugLevel: 4294967295
RmLogonRC: 1
ModifyDeviceFiles: 1
DeviceFileUID: 0
DeviceFileGID: 0
DeviceFileMode: 438
InitializeSystemMemoryAllocations: 1
UsePageAttributeTable: 4294967295
EnableMSI: 1
RegisterForACPIEvents: 1
EnablePCIeGen3: 0
MemoryPoolSize: 0
KMallocHeapMaxSize: 0
VMallocHeapMaxSize: 0
IgnoreMMIOCheck: 0
TCEBypassMode: 0
EnableStreamMemOPs: 0
EnableUserNUMAManagement: 1
NvLinkDisable: 0
RmProfilingAdminOnly: 1
PreserveVideoMemoryAllocations: 1
EnableS0ixPowerManagement: 0
S0ixPowerManagementVideoMemoryThreshold: 256
DynamicPowerManagement: 3
DynamicPowerManagementVideoMemoryThreshold: 200
RegisterPCIDriver: 1
EnablePCIERelaxedOrderingMode: 0
EnableGpuFirmware: 18
RegistryDwords: ""
RegistryDwordsPerDevice: ""
RmMsg: ""
GpuBlacklist: ""
TemporaryFilePath: "/tmp-nvidia"
ExcludedGpus: ""

I’ve also looked at the hibernate, suspend and resume services, which seem to be inactive but loaded.

user@device:~$ sudo service nvidia-suspend status
● nvidia-suspend.service - NVIDIA system suspend actions
     Loaded: loaded (/etc/systemd/system/nvidia-suspend.service; enabled; vendor preset: enabled)
     Active: inactive (dead)

nov 09 09:52:10 device systemd[1]: Starting NVIDIA system suspend actions...
nov 09 09:52:10 device suspend[6680]: nvidia-suspend.service
nov 09 09:52:10 device logger[6680]: <13>Nov  9 09:52:10 suspend: nvidia-suspend.service
nov 09 09:52:11 device systemd[1]: nvidia-suspend.service: Succeeded.
nov 09 09:52:11 device systemd[1]: Finished NVIDIA system suspend actions.
nov 09 09:53:06 device systemd[1]: Starting NVIDIA system suspend actions...
nov 09 09:53:06 device suspend[7975]: nvidia-suspend.service
nov 09 09:53:06 device logger[7975]: <13>Nov  9 09:53:06 suspend: nvidia-suspend.service
nov 09 09:53:07 device systemd[1]: nvidia-suspend.service: Succeeded.
nov 09 09:53:07 device systemd[1]: Finished NVIDIA system suspend actions.

user@device:~$ sudo service nvidia-hibernate status
● nvidia-hibernate.service - NVIDIA system hibernate actions
     Loaded: loaded (/etc/systemd/system/nvidia-hibernate.service; enabled; vendor preset: enabled)
     Active: inactive (dead)

user@device:~$ sudo service nvidia-resume status
● nvidia-resume.service - NVIDIA system resume actions
     Loaded: loaded (/etc/systemd/system/nvidia-resume.service; enabled; vendor preset: enabled)
     Active: inactive (dead)

nov 09 09:52:39 device systemd[1]: Starting NVIDIA system resume actions...
nov 09 09:52:39 device suspend[7377]: nvidia-resume.service
nov 09 09:52:39 device logger[7377]: <13>Nov  9 09:52:39 suspend: nvidia-resume.service
nov 09 09:52:39 device systemd[1]: nvidia-resume.service: Succeeded.
nov 09 09:52:39 device systemd[1]: Finished NVIDIA system resume actions.
nov 09 09:54:07 device systemd[1]: Starting NVIDIA system resume actions...
nov 09 09:54:07 device suspend[8614]: nvidia-resume.service
nov 09 09:54:07 device logger[8614]: <13>Nov  9 09:54:07 suspend: nvidia-resume.service
nov 09 09:54:07 device systemd[1]: nvidia-resume.service: Succeeded.
nov 09 09:54:07 device systemd[1]: Finished NVIDIA system resume actions.

@generix weirdly enough, it didn’t only fix the reboot problem, but even the audio device pci problem was fixed. The second device had all ffs listed before. (I only have a single graphics card, but ever since the start the M1000M gets also recognized as a 940MX).

user@device:~$  sudo lspci -xxx -d 10de:*
01:00.0 VGA compatible controller: NVIDIA Corporation GM107GLM [Quadro M1000M] (rev a2)
00: de 10 b1 13 07 04 10 00 a2 00 00 03 00 00 80 00
10: 00 00 00 e4 0c 00 00 a0 00 00 00 00 0c 00 00 b0
20: 00 00 00 00 01 30 00 00 00 00 00 00 3c 10 d4 80
30: 00 00 00 00 60 00 00 00 00 00 00 00 ff 01 00 00
40: 3c 10 d4 80 00 00 00 00 00 00 00 00 00 00 00 00
50: 01 00 00 00 01 00 00 00 ce d6 23 00 00 00 00 00
60: 01 68 03 00 08 00 00 00 05 78 81 00 38 0a e0 fe
70: 00 00 00 00 00 00 00 00 10 00 02 00 e1 8d 2c 01
80: 30 21 00 00 03 3d 46 00 43 01 01 11 00 00 00 00
90: 00 00 00 00 00 00 00 00 00 00 00 00 13 00 04 00
a0: 00 00 00 00 0e 00 00 00 03 00 1f 00 00 00 00 00
b0: 00 00 00 00 09 00 14 01 00 00 00 00 00 00 00 00
c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

01:00.1 Audio device: NVIDIA Corporation GM107 High Definition Audio Controller [GeForce 940MX] (rev a1)
00: de 10 bc 0f 06 00 10 00 a1 00 03 04 00 00 80 00
10: 00 00 00 e5 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
30: 00 00 00 00 60 00 00 00 00 00 00 00 ff 02 00 00
40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
50: 00 00 00 00 00 00 00 00 ce d6 23 00 00 00 00 00
60: 01 68 03 00 08 00 00 00 05 78 80 00 00 00 00 00
70: 00 00 00 00 00 00 00 00 10 00 02 00 e1 8d 2c 01
80: 30 29 00 00 03 3d 45 00 03 01 01 11 00 00 00 00
90: 00 00 00 00 00 00 00 00 00 00 00 00 13 00 04 00
a0: 00 00 00 00 0e 00 00 00 00 00 01 00 00 00 00 00
b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

Here is my nvidia-bug-report.log.gz (456.0 KB).