Frequent kernel NULL pointer dereference on suspend

Kernel output:

kernel: BUG: kernel NULL pointer dereference, address: 0000000000000000
kernel: #PF: supervisor write access in kernel mode
kernel: #PF: error_code(0x0002) - not-present page
kernel: PGD 0 P4D 0 
kernel: Oops: 0002 [#1] PREEMPT SMP PTI
kernel: CPU: 5 PID: 14965 Comm: nvidia-sleep.sh Tainted: P           OE      6.5.0-21-generic #21-Ubuntu
kernel: Hardware name: LENOVO 20EN001SUS/20EN001SUS, BIOS N1EETA1W (1.74 ) 11/07/2023
kernel: RIP: 0010:_raw_q_flush+0x97/0xf0 [nvidia_uvm]
kernel: Code: 4c 89 64 24 18 e8 b9 66 c1 c4 48 89 c6 48 8b 04 24 4c 39 e8 75 56 48 8b 53 08 48 89 1c 24 4c 89 f7 48 89 43 08 48 89 54 24 08 <48> 89 02 e8 31 67 c1 c4 48 8d 7b 18 e8 08 20 c1 c4 4c 89 e7 e8 00
kernel: RSP: 0018:ffffb9e607053b00 EFLAGS: 00010046
kernel: RAX: ffffb9e607053b00 RBX: ffffb9e6071552c0 RCX: 0000000000000000
kernel: RDX: 0000000000000000 RSI: 0000000000000246 RDI: ffffb9e6071552d0
kernel: RBP: ffffb9e607053b78 R08: 0000000000000000 R09: 0000000000000000
kernel: R10: 0000000000000000 R11: 0000000000000000 R12: ffffb9e607053b28
kernel: R13: ffffb9e607053b00 R14: ffffb9e6071552d0 R15: ffff93afa1e03000
kernel: FS:  00007fc1c4061740(0000) GS:ffff93b293d40000(0000) knlGS:0000000000000000
kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
kernel: CR2: 0000000000000000 CR3: 00000002cfcbe002 CR4: 00000000003706e0
kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
kernel: Call Trace:
kernel:  <TASK>
kernel:  ? show_regs+0x6d/0x80
kernel:  ? __die+0x24/0x80
kernel:  ? page_fault_oops+0x99/0x1b0
kernel:  ? do_user_addr_fault+0x316/0x6b0
kernel:  ? exc_page_fault+0x83/0x1b0
kernel:  ? asm_exc_page_fault+0x27/0x30
kernel:  ? _raw_q_flush+0x97/0xf0 [nvidia_uvm]
kernel:  ? __pfx__q_flush_function+0x10/0x10 [nvidia_uvm]
kernel:  nv_kthread_q_flush+0x19/0x80 [nvidia_uvm]
kernel:  uvm_suspend+0x9e/0x1f0 [nvidia_uvm]
kernel:  uvm_suspend_entry.part.0+0x4e/0xa0 [nvidia_uvm]
kernel:  uvm_suspend_entry+0x27/0x30 [nvidia_uvm]
kernel:  nv_uvm_suspend+0x2e/0x50 [nvidia]
kernel:  nv_set_system_power_state+0x3d3/0x480 [nvidia]
kernel:  nv_procfs_write_suspend+0x106/0x1c0 [nvidia]
kernel:  proc_reg_write+0x69/0xb0
kernel:  vfs_write+0xff/0x440
kernel:  ksys_write+0x73/0x100
kernel:  __x64_sys_write+0x19/0x30
kernel:  do_syscall_64+0x59/0x90
kernel:  ? exit_to_user_mode_prepare+0x30/0xb0
kernel:  ? syscall_exit_to_user_mode+0x37/0x60
kernel:  ? do_syscall_64+0x68/0x90
kernel:  ? exit_to_user_mode_prepare+0x30/0xb0
kernel:  ? irqentry_exit_to_user_mode+0x17/0x20
kernel:  ? irqentry_exit+0x43/0x50
kernel:  ? exc_page_fault+0x94/0x1b0
kernel:  entry_SYSCALL_64_after_hwframe+0x6e/0xd8
kernel: RIP: 0033:0x7fc1c3f1b294
kernel: Code: c7 00 16 00 00 00 b8 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 f3 0f 1e fa 80 3d b5 b2 0e 00 00 74 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 48 83 ec 28 48 89 54 24 18 48
kernel: RSP: 002b:00007ffe57b5e418 EFLAGS: 00000202 ORIG_RAX: 0000000000000001
kernel: RAX: ffffffffffffffda RBX: 0000000000000008 RCX: 00007fc1c3f1b294
kernel: RDX: 0000000000000008 RSI: 00005564342fcad0 RDI: 0000000000000001
kernel: RBP: 00005564342fcad0 R08: ffffffffffffffc0 R09: 0000000000000410
kernel: R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000008
kernel: R13: 00007fc1c3fff7a0 R14: 00007fc1c3ffd120 R15: 0000000000000000
kernel:  </TASK>
kernel: Modules linked in: rpcsec_gss_krb5 nfsv4 nfs lockd grace fscache netfs veth tls xt_nat xt_tcpudp xt_conntrack nft_chain_nat xt_MASQUERADE nf_nat nf_conntrack_netlink nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xfrm_user xfrm_algo xt_addrtype nft_compat nf_tables libcrc32c nfnetlink br_netfilter bridge stp llc ccm rfcomm snd_seq_dummy snd_hrtimer overlay cmac algif_hash algif_skcipher af_alg bnep nvidia_uvm(POE) rmi_smbus rmi_core nvidia_drm(POE) nvidia_modeset(POE) intel_rapl_msr intel_rapl_common intel_tcc_cooling x86_pkg_temp_thermal intel_powerclamp binfmt_misc coretemp kvm_intel nls_iso8859_1 nvidia(POE) kvm irqbypass snd_ctl_led snd_soc_avs snd_hda_codec_realtek snd_soc_hda_codec snd_hda_ext_core snd_hda_codec_generic snd_soc_core snd_hda_codec_hdmi snd_compress ac97_bus snd_pcm_dmaengine snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi snd_hda_codec uvcvideo iwlmvm btusb snd_hda_core btrtl snd_hwdep videobuf2_vmalloc thinkpad_acpi uvc mei_hdcp mei_pxp ee1004 snd_pcm btbcm nvram videobuf2_memops
kernel:  mac80211 btintel btmtk videobuf2_v4l2 rapl snd_seq_midi libarc4 videodev snd_seq_midi_event bluetooth snd_rawmidi videobuf2_common intel_cstate mc snd_seq ecdh_generic iwlwifi think_lmi snd_seq_device intel_wmi_thunderbolt ecc firmware_attributes_class snd_timer wmi_bmof i2c_i801 cfg80211 drm_kms_helper snd i2c_smbus mei_me soundcore intel_pch_thermal mei ledtrig_audio ie31200_edac platform_profile joydev input_leds mac_hid serio_raw msr auth_rpcgss parport_pc ppdev lp parport drm efi_pstore sunrpc dmi_sysfs ip_tables x_tables autofs4 dm_crypt hid_generic usbhid hid crct10dif_pclmul crc32_pclmul polyval_clmulni polyval_generic ghash_clmulni_intel aesni_intel crypto_simd rtsx_pci_sdmmc cryptd nvme psmouse e1000e nvme_core rtsx_pci ahci nvme_common xhci_pci libahci xhci_pci_renesas video wmi
kernel: CR2: 0000000000000000
kernel: ---[ end trace 0000000000000000 ]---
kernel: RIP: 0010:_raw_q_flush+0x97/0xf0 [nvidia_uvm]
kernel: Code: 4c 89 64 24 18 e8 b9 66 c1 c4 48 89 c6 48 8b 04 24 4c 39 e8 75 56 48 8b 53 08 48 89 1c 24 4c 89 f7 48 89 43 08 48 89 54 24 08 <48> 89 02 e8 31 67 c1 c4 48 8d 7b 18 e8 08 20 c1 c4 4c 89 e7 e8 00
kernel: RSP: 0018:ffffb9e607053b00 EFLAGS: 00010046
kernel: RAX: ffffb9e607053b00 RBX: ffffb9e6071552c0 RCX: 0000000000000000
kernel: RDX: 0000000000000000 RSI: 0000000000000246 RDI: ffffb9e6071552d0
kernel: RBP: ffffb9e607053b78 R08: 0000000000000000 R09: 0000000000000000
kernel: R10: 0000000000000000 R11: 0000000000000000 R12: ffffb9e607053b28
kernel: R13: ffffb9e607053b00 R14: ffffb9e6071552d0 R15: ffff93afa1e03000
kernel: FS:  00007fc1c4061740(0000) GS:ffff93b293d40000(0000) knlGS:0000000000000000
kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
kernel: CR2: 0000000000000000 CR3: 00000002cfcbe002 CR4: 00000000003706e0
kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
kernel: note: nvidia-sleep.sh[14965] exited with irqs disabled
kernel: note: nvidia-sleep.sh[14965] exited with preempt_count 1

Kernel version: 6.5.0-21-generic
Nvidia driver: 550.54.14
Device: Quadro M2000M

Any way to fix/work around?
Thakns!

nvidia-bug-report.log.gz (412.6 KB)

This happens 100% of the time when card’s memory is nearly full, e.g. if CUDA app is running.

Having the same issue