NVIDIA 470.63.01 driver randomly hangs with no video output when resuming from suspend using the /proc interface on GeForce GTX 960

Hello, this is about a long-standing issue I have with the /proc interface, that I experience since at least driver 460.32.03.

When suspending the system with nvidia-suspend.service enabled the system will not resume properly in about 40% of all cases. Instead, the system will wake up leaving my monitors (2x DisplayPort) without a video signal. Reconnecting the monitors does not help in this situation, and trying to switch TTYs with Ctrl+Alt+F2 similarly does nothing.

But unlike some similar issues, the rest of the system seems to resume normally. The USB controllers are powered, the keyboard reacts when e.g. pressing capslock, and the NetworkManager and sshd services start as normal.

Investigating the issue over ssh shows that the nvidia-sleep.sh process is unresponsive at 100% CPU and can not be killed by SIGTERM, SIGKILL or even SIGSEGV. Running nvidia-bug-report.sh in this state hangs even with the recommended flags, producing the attached file. Trying to call systemctl poweroff over ssh kills the ssh connection, but does not actually shut down the computer. I need to either use SysRq+REISUB or perform a hard shutdown to regain control.

Just to make it clear, this only happens when nvidia-suspend.service in enabled and the system is therefore suspended with nvidia-sleep.sh suspend. Whether or not it is resumed with nvidia-sleep.sh resume does not make a difference, and neither does the activation status of NVreg_PreserveVideoMemoryAllocations.

Here are some things I have tried that did not fix the issue for me:

  • Setting NVreg_EnableMSI=0.
  • Setting acpi_osi=Windows 2015.
  • Using legacy persistence mode.
  • Using nvidia-persistenced.service based persistence mode.
  • Having nvidia-sleep.sh be called later during the resume process.
  • Having chvt be called later during the resume process.
  • Removing the “NVIDIA Corporation GM206 High Definition Audio Controller” [10de:0fba] PCI device using udev.
  • Enabling/Disabling NVreg_PreserveVideoMemoryAllocations.
  • Updating my BIOS.

These messages are written to the journal during the failing resume process:

kernel: ------------[ cut here ]------------
kernel: WARNING: CPU: 2 PID: 23018 at /build/linux514-nvidia/src/NVIDIA-Linux-x86_64-470.63.01-no-compat32/kernel/nvidia/nv.c:3967 nv_restore_user_channels+0xc9/0xe0 [nvidia]
kernel: Modules linked in: rfcomm cmac algif_hash algif_skcipher af_alg snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio bnep uvcvideo btusb btrtl btbcm videobuf2_vmalloc btintel videobuf2_memops videob>
kernel:  sysimgblt fb_sys_fops nvidia(POE) soundcore mei intel_pch_thermal wmi video mac_hid acpi_pad drm ledtrig_timer sg crypto_user fuse ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd2 usbhid sr_mod>
kernel: CPU: 2 PID: 23018 Comm: nvidia-sleep.sh Tainted: P           OE     5.14.2-1-MANJARO #1
kernel: Hardware name: MSI MS-7A12/Z170A GAMING PRO CARBON (MS-7A12), BIOS 1.90 01/25/2018
kernel: RIP: 0010:nv_restore_user_channels+0xc9/0xe0 [nvidia]
kernel: Code: 89 9c d6 be 01 00 00 00 4c 89 e7 e8 41 a1 00 00 4c 89 ff e8 19 89 9c d6 ba 02 00 00 00 4c 89 e6 48 89 ef e8 59 79 9c 00 eb 94 <0f> 0b eb c6 41 bd 51 00 00 00 eb 9f 66 66 2e 0f 1f 84 00 00 00 00
kernel: RSP: 0018:ffffacbc469abe20 EFLAGS: 00010206
kernel: RAX: 0000000000000003 RBX: 0000000000000002 RCX: ffffacbc469abdb8
kernel: RDX: 0000000000000087 RSI: 0000000000000246 RDI: ffff995f83321028
kernel: RBP: ffff9963d03db000 R08: 0000000000000000 R09: ffff9964e6dacf30
kernel: R10: 0000000000000000 R11: 0000000000000003 R12: ffff995f88474000
kernel: R13: 0000000000000003 R14: ffff995f88474520 R15: ffff995f88474000
kernel: FS:  00007f8c5a014b80(0000) GS:ffff9964e6d00000(0000) knlGS:0000000000000000
kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
kernel: CR2: 00007f34b95f9000 CR3: 00000005f73a0003 CR4: 00000000003706e0
kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
kernel: Call Trace:
kernel:  nv_set_system_power_state+0x222/0x3c0 [nvidia]
kernel:  nv_procfs_write_suspend+0x100/0x150 [nvidia]
kernel:  proc_reg_write+0x55/0xa0
kernel:  vfs_write+0xbc/0x270
kernel:  ksys_write+0x67/0xe0
kernel:  do_syscall_64+0x3b/0x90
kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xae
kernel: RIP: 0033:0x7f8c5a175907
kernel: Code: 0d 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24
kernel: RSP: 002b:00007ffead963408 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
kernel: RAX: ffffffffffffffda RBX: 0000000000000007 RCX: 00007f8c5a175907
kernel: RDX: 0000000000000007 RSI: 000055b5dae39160 RDI: 0000000000000001
kernel: RBP: 000055b5dae39160 R08: 000000000000000a R09: 00007f8c5a246a60
kernel: R10: 0000000000000077 R11: 0000000000000246 R12: 0000000000000007
kernel: R13: 00007f8c5a247520 R14: 0000000000000007 R15: 00007f8c5a247700
kernel: ---[ end trace f449d36c8afbba7c ]---
kernel: ------------[ cut here ]------------
kernel: WARNING: CPU: 2 PID: 23018 at /build/linux514-nvidia/src/NVIDIA-Linux-x86_64-470.63.01-no-compat32/kernel/nvidia/nv.c:4162 nv_set_system_power_state+0x2c0/0x3c0 [nvidia]
kernel: Modules linked in: rfcomm cmac algif_hash algif_skcipher af_alg snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio bnep uvcvideo btusb btrtl btbcm videobuf2_vmalloc btintel videobuf2_memops videob>
kernel:  sysimgblt fb_sys_fops nvidia(POE) soundcore mei intel_pch_thermal wmi video mac_hid acpi_pad drm ledtrig_timer sg crypto_user fuse ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd2 usbhid sr_mod>
kernel: CPU: 2 PID: 23018 Comm: nvidia-sleep.sh Tainted: P        W  OE     5.14.2-1-MANJARO #1
kernel: Hardware name: MSI MS-7A12/Z170A GAMING PRO CARBON (MS-7A12), BIOS 1.90 01/25/2018
kernel: RIP: 0010:nv_set_system_power_state+0x2c0/0x3c0 [nvidia]
kernel: Code: ed 0f 84 4c ff ff ff 41 83 fc 02 74 ea 48 8b 85 88 02 00 00 be 02 00 00 00 48 8b 78 78 e8 b8 d1 ff ff 85 c0 74 d1 0f 0b eb cd <0f> 0b e9 63 ff ff ff 48 c7 c7 d0 fa 4e c2 e8 5d 58 9c d6 e8 78 1c
kernel: RSP: 0018:ffffacbc469abe50 EFLAGS: 00010206
kernel: RAX: 0000000000000003 RBX: 0000000000000002 RCX: ffff995f83321560
kernel: RDX: 0000000003e37e02 RSI: ffffffffc052e954 RDI: 000033575900a6d0
kernel: RBP: ffff995f88474000 R08: 0000000000000000 R09: ffff9964e6dacf30
kernel: R10: ffff9963d03db000 R11: 0000000000000003 R12: 0000000000000000
kernel: R13: 000055b5dae39160 R14: ffffacbc469abf08 R15: 0000000000000007
kernel: FS:  00007f8c5a014b80(0000) GS:ffff9964e6d00000(0000) knlGS:0000000000000000
kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
kernel: CR2: 00007f34b95f9000 CR3: 00000005f73a0003 CR4: 00000000003706e0
kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
kernel: Call Trace:
kernel:  nv_procfs_write_suspend+0x100/0x150 [nvidia]
kernel:  proc_reg_write+0x55/0xa0
kernel:  vfs_write+0xbc/0x270
kernel:  ksys_write+0x67/0xe0
kernel:  do_syscall_64+0x3b/0x90
kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xae
kernel: RIP: 0033:0x7f8c5a175907
kernel: Code: 0d 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24
kernel: RSP: 002b:00007ffead963408 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
kernel: RAX: ffffffffffffffda RBX: 0000000000000007 RCX: 00007f8c5a175907
kernel: RDX: 0000000000000007 RSI: 000055b5dae39160 RDI: 0000000000000001
kernel: RBP: 000055b5dae39160 R08: 000000000000000a R09: 00007f8c5a246a60
kernel: R10: 0000000000000077 R11: 0000000000000246 R12: 0000000000000007
kernel: R13: 00007f8c5a247520 R14: 0000000000000007 R15: 00007f8c5a247700
kernel: ---[ end trace f449d36c8afbba7d ]---
kernel: nvidia-modeset: WARNING: GPU:0: Lost display notification (0:0x00000000); continuing.
kernel: nvidia-modeset: ERROR: GPU:0: Idling display engine timed out: 0x0000957d:0:0:428

These messages are repeatedly written later after the resume process, indicating the stuck nvidia-sleep.sh process is blocking the nvidia_modeset driver:

kernel: INFO: task nvidia-modeset/:355 blocked for more than 122 seconds.
kernel:       Tainted: P        W  OE     5.14.2-1-MANJARO #1
kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kernel: task:nvidia-modeset/ state:D stack:    0 pid:  355 ppid:     2 flags:0x00004000
kernel: Call Trace:
kernel:  __schedule+0x316/0x940
kernel:  schedule+0x59/0xc0
kernel:  rwsem_down_read_slowpath+0x384/0x3e0
kernel:  nvkms_kthread_q_callback+0x71/0x100 [nvidia_modeset]
kernel:  _main_loop+0x9e/0x150 [nvidia_modeset]
kernel:  ? nvkms_sema_up+0x10/0x10 [nvidia_modeset]
kernel:  kthread+0x132/0x160
kernel:  ? set_kthread_struct+0x40/0x40
kernel:  ret_from_fork+0x22/0x30

OS: Manjaro 21.1.3 Pahvo
CPU: Intel Core i5-6600K CPU @ 3.50GHz
GPU: GeForce GTX 960
Driver: NVIDIA 470.63.01 (linux514-nvidia-470.63.01-4-x86_64 from the Manjaro repositories)
Mainboard: MSI MS-7A12/Z170A GAMING PRO CARBON (MS-7A12)
Kernel: 5.14.2-1-MANJARO

nvidia-bug-report.log.gz (1.1 KB)

1 Like

Hi insert-penguin, any luck resolving this? I’ve got a similar issue, also a GTX 960, the warning I get is:

kernel: [45886.201132] ------------[ cut here ]------------
kernel: [45886.201134] WARNING: CPU: 7 PID: 76616 at /var/lib/dkms/nvidia/470.63.01/build/nvidia/nv.c:3967 nv_restore_user_channels+0xce/0xe0 [nvidia]
kernel: [45886.201284] Modules linked in: rfcomm nf_conntrack_netlink xfrm_user xfrm_algo xt_addrtype br_netfilter ccm xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp nft_compat nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nft_counter nf_tables libcrc32c nfnetlink cmac bridge algif_hash stp llc algif_skcipher overlay af_alg bnep nvidia_uvm(POE) nvidia_drm(POE) nvidia_modeset(POE) intel_rapl_msr intel_rapl_common nvidia(POE) snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio snd_hda_codec_hdmi snd_hda_intel iwlmvm snd_intel_dspcfg soundwire_intel soundwire_generic_allocation soundwire_cadence nls_iso8859_1 snd_hda_codec edac_mce_amd mac80211 snd_hda_core snd_hwdep soundwire_bus kvm_amd snd_soc_core libarc4 kvm snd_compress ac97_bus snd_pcm_dmaengine btusb snd_seq_midi btrtl snd_seq_midi_event btbcm crct10dif_pclmul btintel ghash_clmulni_intel snd_pcm snd_rawmidi bluetooth aesni_intel snd_seq crypto_simd drm_kms_helper cryptd glue_helper
kernel: [45886.201312]  snd_seq_device ecdh_generic rapl joydev input_leds eeepc_wmi wmi_bmof mxm_wmi efi_pstore iwlwifi ccp ecc cec snd_timer k10temp snd rc_core cfg80211 fb_sys_fops syscopyarea soundcore sysfillrect sysimgblt mac_hid sch_fq_codel msr parport_pc ppdev lp parport drm ip_tables x_tables autofs4 hid_logitech_hidpp hid_logitech_dj hid_generic usbhid hid mfd_aaeon asus_wmi sparse_keymap video igb nvme ahci i2c_algo_bit xhci_pci crc32_pclmul i2c_piix4 libahci nvme_core xhci_pci_renesas dca wmi
kernel: [45886.201333] CPU: 7 PID: 76616 Comm: nvidia-sleep.sh Tainted: P        W  OE     5.11.0-37-generic #41-Ubuntu
kernel: [45886.201335] Hardware name: System manufacturer System Product Name/PRIME X570-PRO, BIOS 3604 05/08/2021
kernel: [45886.201336] RIP: 0010:nv_restore_user_channels+0xce/0xe0 [nvidia]
kernel: [45886.201446] Code: 05 a3 cf be 01 00 00 00 4c 89 ef e8 7c a5 00 00 48 89 df e8 64 04 a3 cf ba 02 00 00 00 4c 89 ee 4c 89 e7 e8 34 83 9c 00 eb 93 <0f> 0b eb c6 41 be 51 00 00 00 eb 9e 66 0f 1f 44 00 00 0f 1f 44 00
kernel: [45886.201447] RSP: 0018:ffffab994252bde8 EFLAGS: 00010206
kernel: [45886.201448] RAX: 0000000000000003 RBX: ffff9dde46c04000 RCX: ffffab994252bd80
kernel: [45886.201448] RDX: 0000000000000087 RSI: 0000000000000246 RDI: 0000000000000246
kernel: [45886.201449] RBP: ffffab994252be10 R08: 0000000000000000 R09: ffff9de54ea2c3f0
kernel: [45886.201449] R10: 0000000000000000 R11: 00000000000001ae R12: ffff9de2a05a8000
kernel: [45886.201451] R13: ffff9dde46c04000 R14: 0000000000000003 R15: ffff9dde46c04520
kernel: [45886.201451] FS:  00007f0b00989740(0000) GS:ffff9de54ebc0000(0000) knlGS:0000000000000000
kernel: [45886.201452] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
kernel: [45886.201453] CR2: 00001bb800485068 CR3: 000000011f396000 CR4: 0000000000750ee0
kernel: [45886.201454] PKRU: 55555554
kernel: [45886.201454] Call Trace:
kernel: [45886.201456]  nv_set_system_power_state+0x228/0x3d0 [nvidia]
kernel: [45886.201566]  nv_procfs_write_suspend+0xea/0x140 [nvidia]
kernel: [45886.201676]  proc_reg_write+0x5a/0x90
kernel: [45886.201680]  ? _cond_resched+0x1a/0x50
kernel: [45886.201682]  vfs_write+0xc6/0x270
kernel: [45886.201684]  ksys_write+0x67/0xe0
kernel: [45886.201686]  __x64_sys_write+0x1a/0x20
kernel: [45886.201687]  do_syscall_64+0x38/0x90
kernel: [45886.201688]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
kernel: [45886.201690] RIP: 0033:0x7f0b00a93c27
kernel: [45886.201691] Code: 0d 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24
kernel: [45886.201691] RSP: 002b:00007fff21085848 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
kernel: [45886.201693] RAX: ffffffffffffffda RBX: 0000000000000007 RCX: 00007f0b00a93c27
kernel: [45886.201693] RDX: 0000000000000007 RSI: 000055b314034fe0 RDI: 0000000000000001
kernel: [45886.201694] RBP: 000055b314034fe0 R08: 000000000000000a R09: 000055b314034fe0
kernel: [45886.201694] R10: 0000000000000077 R11: 0000000000000246 R12: 0000000000000007
kernel: [45886.201695] R13: 00007f0b00b6d6c0 R14: 00007f0b00b6e4a0 R15: 00007f0b00b6d8a0
kernel: [45886.201696] ---[ end trace 58ffcae3d517b433 ]---
kernel: [45886.201702] ------------[ cut here ]------------

Also happen in Ubuntu 20.04.3 with Kepler GPU.
Few times the machine will resume without problems.

I’m seeing the same thing, on a desktop with a GTX980.
I think I can trigger it fairly reliably by having the primary monitor powered up but switched to a different input when I resume.

I get the following traces:

[   46.140989] WARNING: CPU: 0 PID: 3193 at /var/lib/dkms/nvidia/470.74/build/nvidia/nv.c:3967 nv_restore_user_channels+0xce/0xe0 [nvidia]
[   46.141182] Modules linked in: snd_seq_dummy snd_hrtimer xt_mark xt_comment xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp nft_compat nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nft_counter nf_tables nfnetlink bridge stp llc nvidia_uvm(POE) nvidia_drm(POE) nvidia_modeset(POE) nls_iso8859_1 nvidia(POE) intel_rapl_msr mei_hdcp intel_rapl_common x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm crct10dif_pclmul ghash_clmulni_intel aesni_intel snd_hda_codec_realtek snd_hda_codec_generic crypto_simd cryptd ledtrig_audio rapl intel_cstate snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi snd_hda_codec wmi_bmof eeepc_wmi efi_pstore snd_hda_core input_leds snd_hwdep at24 snd_pcm snd_seq_midi snd_seq_midi_event snd_rawmidi drm_kms_helper snd_seq cec mei_me snd_seq_device rc_core snd_timer mei fb_sys_fops syscopyarea sysfillrect sysimgblt snd soundcore mac_hid sch_fq_codel cuse nct6775 hwmon_vid msr parport_pc ppdev lp parport drm sunrpc
[   46.141220]  ip_tables x_tables autofs4 hid_logitech_hidpp hid_logitech_dj hid_generic uas usbhid hid usb_storage raid1 dm_raid raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c mfd_aaeon asus_wmi sparse_keymap ahci r8169 xhci_pci i2c_i801 crc32_pclmul lpc_ich xhci_pci_renesas libahci i2c_smbus realtek wmi video
[   46.141236] CPU: 0 PID: 3193 Comm: nvidia-sleep.sh Tainted: P           OE     5.13.0-20-lowlatency #20-Ubuntu
[   46.141238] Hardware name: ASUS All Series/H87M-PRO, BIOS 2102 10/28/2014
[   46.141239] RIP: 0010:nv_restore_user_channels+0xce/0xe0 [nvidia]
[   46.141368] Code: 41 85 e2 be 01 00 00 00 4c 89 ef e8 2c a4 00 00 48 89 df e8 94 40 85 e2 ba 02 00 00 00 4c 89 ee 4c 89 e7 e8 44 8b 9c 00 eb 93 <0f> 0b eb c6 41 be 51 00 00 00 eb 9e 66 0f 1f 44 00 00 0f 1f 44 00
[   46.141369] RSP: 0018:ffffbb76c33bbd30 EFLAGS: 00010206
[   46.141371] RAX: 0000000000000003 RBX: ffff9c9f85fa1000 RCX: ffffbb76c33bbcc8
[   46.141372] RDX: 0000000000000087 RSI: 0000000000000246 RDI: ffff9c9f85f0c648
[   46.141373] RBP: ffffbb76c33bbd58 R08: 0000000000000000 R09: ffff9ca28dd2d170
[   46.141374] R10: 0000000000000000 R11: 00000000000001f2 R12: ffff9c9f891a8000
[   46.141374] R13: ffff9c9f85fa1000 R14: 0000000000000003 R15: ffff9c9f85fa1528
[   46.141375] FS:  00007f6154437740(0000) GS:ffff9ca28dc00000(0000) knlGS:0000000000000000
[   46.141377] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   46.141378] CR2: 000055cf27e162e0 CR3: 0000000138f02001 CR4: 00000000001706f0
[   46.141379] Call Trace:
[   46.141381]  nv_set_system_power_state+0x227/0x3d0 [nvidia]
[   46.141511]  nv_procfs_write_suspend+0xe9/0x140 [nvidia]
[   46.141643]  proc_reg_write+0x5a/0x90
[   46.141645]  vfs_write+0xc3/0x280
[   46.141648]  ksys_write+0x67/0xe0
[   46.141650]  __x64_sys_write+0x19/0x20
[   46.141652]  do_syscall_64+0x61/0xb0
[   46.141655]  ? handle_mm_fault+0xdf/0x2c0
[   46.141658]  ? do_user_addr_fault+0x1ed/0x670
[   46.141661]  ? exit_to_user_mode_prepare+0x37/0xb0
[   46.141663]  ? irqentry_exit_to_user_mode+0x9/0x20
[   46.141665]  ? irqentry_exit+0x33/0x40
[   46.141667]  ? exc_page_fault+0x8f/0x190
[   46.141669]  ? asm_exc_page_fault+0x8/0x30
[   46.141671]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[   46.141674] RIP: 0033:0x7f61545519b7
[   46.141675] Code: 10 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24
[   46.141676] RSP: 002b:00007ffe45f1efd8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[   46.141678] RAX: ffffffffffffffda RBX: 0000000000000007 RCX: 00007f61545519b7
[   46.141679] RDX: 0000000000000007 RSI: 0000563fd4795d90 RDI: 0000000000000001
[   46.141679] RBP: 0000563fd4795d90 R08: 0000000000000000 R09: 0000000000000000
[   46.141680] R10: 00007f6154652cc0 R11: 0000000000000246 R12: 0000000000000007
[   46.141681] R13: 00007f6154653760 R14: 00007f6154654560 R15: 00007f6154653960
[   46.141683] ---[ end trace 336a6415ef6f0548 ]---

and then this one:

[   46.141693] ------------[ cut here ]------------
[   46.141693] WARNING: CPU: 0 PID: 3193 at /var/lib/dkms/nvidia/470.74/build/nvidia/nv.c:4162 nv_set_system_power_state+0x2c8/0x3d0 [nvidia]
[   46.141824] Modules linked in: snd_seq_dummy snd_hrtimer xt_mark xt_comment xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp nft_compat nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nft_counter nf_tables nfnetlink bridge stp llc nvidia_uvm(POE) nvidia_drm(POE) nvidia_modeset(POE) nls_iso8859_1 nvidia(POE) intel_rapl_msr mei_hdcp intel_rapl_common x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm crct10dif_pclmul ghash_clmulni_intel aesni_intel snd_hda_codec_realtek snd_hda_codec_generic crypto_simd cryptd ledtrig_audio rapl intel_cstate snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi snd_hda_codec wmi_bmof eeepc_wmi efi_pstore snd_hda_core input_leds snd_hwdep at24 snd_pcm snd_seq_midi snd_seq_midi_event snd_rawmidi drm_kms_helper snd_seq cec mei_me snd_seq_device rc_core snd_timer mei fb_sys_fops syscopyarea sysfillrect sysimgblt snd soundcore mac_hid sch_fq_codel cuse nct6775 hwmon_vid msr parport_pc ppdev lp parport drm sunrpc
[   46.141853]  ip_tables x_tables autofs4 hid_logitech_hidpp hid_logitech_dj hid_generic uas usbhid hid usb_storage raid1 dm_raid raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c mfd_aaeon asus_wmi sparse_keymap ahci r8169 xhci_pci i2c_i801 crc32_pclmul lpc_ich xhci_pci_renesas libahci i2c_smbus realtek wmi video
[   46.141865] CPU: 0 PID: 3193 Comm: nvidia-sleep.sh Tainted: P        W  OE     5.13.0-20-lowlatency #20-Ubuntu
[   46.141866] Hardware name: ASUS All Series/H87M-PRO, BIOS 2102 10/28/2014
[   46.141867] RIP: 0010:nv_set_system_power_state+0x2c8/0x3d0 [nvidia]
[   46.142035] Code: 0f 84 4a ff ff ff 41 83 fd 02 74 e9 49 8b 84 24 90 02 00 00 be 02 00 00 00 48 8b 78 78 e8 b0 cf ff ff 85 c0 74 cf 0f 0b eb cb <0f> 0b e9 60 ff ff ff 48 c7 c7 50 3b 87 c2 e8 c5 0d 85 e2 e8 60 22
[   46.142037] RSP: 0018:ffffbb76c33bbd68 EFLAGS: 00010206
[   46.142038] RAX: 0000000000000003 RBX: 0000000000000002 RCX: ffff9c9f85f0c740
[   46.142039] RDX: 00000000002cc000 RSI: ffffffffc08bb9c8 RDI: 00003ed432018130
[   46.142039] RBP: ffffbb76c33bbd98 R08: 0000000000000000 R09: ffff9ca28dd2d170
[   46.142040] R10: 0000000000000000 R11: 00000000000001f2 R12: ffff9c9f85fa1000
[   46.142041] R13: 0000000000000000 R14: 0000563fd4795d90 R15: ffffbb76c33bbe38
[   46.142051] FS:  00007f6154437740(0000) GS:ffff9ca28dc00000(0000) knlGS:0000000000000000
[   46.142053] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   46.142054] CR2: 000055cf27e162e0 CR3: 0000000138f02001 CR4: 00000000001706f0
[   46.142055] Call Trace:
[   46.142056]  nv_procfs_write_suspend+0xe9/0x140 [nvidia]
[   46.142235]  proc_reg_write+0x5a/0x90
[   46.142236]  vfs_write+0xc3/0x280
[   46.142239]  ksys_write+0x67/0xe0
[   46.142241]  __x64_sys_write+0x19/0x20
[   46.142244]  do_syscall_64+0x61/0xb0
[   46.142246]  ? handle_mm_fault+0xdf/0x2c0
[   46.142248]  ? do_user_addr_fault+0x1ed/0x670
[   46.142250]  ? exit_to_user_mode_prepare+0x37/0xb0
[   46.142256]  ? irqentry_exit_to_user_mode+0x9/0x20
[   46.142258]  ? irqentry_exit+0x33/0x40
[   46.142260]  ? exc_page_fault+0x8f/0x190
[   46.142261]  ? asm_exc_page_fault+0x8/0x30
[   46.142264]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[   46.142266] RIP: 0033:0x7f61545519b7
[   46.142267] Code: 10 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24
[   46.142268] RSP: 002b:00007ffe45f1efd8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[   46.142269] RAX: ffffffffffffffda RBX: 0000000000000007 RCX: 00007f61545519b7
[   46.142270] RDX: 0000000000000007 RSI: 0000563fd4795d90 RDI: 0000000000000001
[   46.142271] RBP: 0000563fd4795d90 R08: 0000000000000000 R09: 0000000000000000
[   46.142271] R10: 00007f6154652cc0 R11: 0000000000000246 R12: 0000000000000007
[   46.142272] R13: 00007f6154653760 R14: 00007f6154654560 R15: 00007f6154653960
[   46.142274] ---[ end trace 336a6415ef6f0549 ]---

And then these lines:
[ 49.143844] nvidia-modeset: WARNING: GPU:0: Lost display notification (0:0x00000000); continuing.
[ 51.561164] nvidia-modeset: ERROR: GPU:0: Idling display engine timed out: 0x0000957d:0:0:428

I’ve tried the removing of the audio device with udes, but it didn’t seem to help.

The issue still occurs with the 495.44 driver. I have not been able to find out more about this issue, and I am really hoping somebody from NVIDIA will see this and provide us with a solution.

1 Like

Can also confirm this issue on manjaro, gnome, geforce 1660, 495 drivers, kernel 5.15. Reproduced on hdmi 4k 60hz output.

Same problem here.
ubuntu 20.04, driver 470.103.01, Quadro K2100M

I don’t know if it is related, but another strange things occurred quite at the same time: since I plugged an HDMI monitor, I can no longer have a graphic session on the laptop screen even if the HDMI is not plugged. ( the boot log still display on the laptop screen, though ). Basically, the laptop screen is no longer seen at all by nvidia.

Good new:
My sysadmin just fixed the issue on my DELL laptop ( with Kepler Quadro K2100M ) by switching on the intel card in the BIOS and deleting “acpi_osi=Linux” in grub → GRUB_CMDLINE_LINUX_DEFAULT=“” ( not sure whether both were required ).