I am getting kernel oops randomly from nvidia uvm driver. PFB the details.
OS: Ubuntu 22.04 LTS
Kernel version: 6.2.0-32
$ uname -a
Linux sreenathv 6.2.0-32-generic #32~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Fri Aug 18 10:40:13 UTC 2 x86_64 x86_64 x86_64 GNU/Linux
Nvidia driver version: 535.86.05
$ cat /proc/driver/nvidia/version
NVRM version: NVIDIA UNIX x86_64 Kernel Module 535.86.05 Fri Jul 14 20:46:33 UTC 2023
Kernel Log:
Oct 1 20:15:52 sreenathv kernel: [ 3557.052869] BUG: kernel NULL pointer dereference, address: 0000000000000000
Oct 1 20:15:52 sreenathv kernel: [ 3557.052924] #PF: supervisor write access in kernel mode
Oct 1 20:15:52 sreenathv kernel: [ 3557.052953] #PF: error_code(0x0002) - not-present page
Oct 1 20:15:52 sreenathv kernel: [ 3557.052980] PGD 0 P4D 0
Oct 1 20:15:52 sreenathv kernel: [ 3557.053001] Oops: 0002 [#1] PREEMPT SMP PTI
Oct 1 20:15:52 sreenathv kernel: [ 3557.053028] CPU: 3 PID: 9278 Comm: nvidia-sleep.sh Tainted: P OE 6.2.0-32-generic #32~22.04.1-Ubuntu
Oct 1 20:15:52 sreenathv kernel: [ 3557.053078] Hardware name: ASUSTeK COMPUTER INC. X555LF/X555LF, BIOS X555LF.504 08/04/2015
Oct 1 20:15:52 sreenathv kernel: [ 3557.053116] RIP: 0010:_raw_q_flush+0x95/0xf0 [nvidia_uvm]
Oct 1 20:15:52 sreenathv kernel: [ 3557.053237] Code: 4c 89 64 24 18 e8 eb c9 e0 e2 48 89 c6 48 8b 04 24 4c 39 e8 75 56 48 8b 53 08 48 89 1c 24 4c 89 f7 48 89 43 08 48 89 54 24 08 <48> 89 02 e8 63 ca e0 e2 48 8
d 7b 18 e8 ea 73 e0 e2 4c 89 e7 e8 62
Oct 1 20:15:52 sreenathv kernel: [ 3557.053317] RSP: 0018:ffffb9328c7a7bb0 EFLAGS: 00010046
Oct 1 20:15:52 sreenathv kernel: [ 3557.053347] RAX: ffffb9328c7a7bb0 RBX: ffffb9328123b2a8 RCX: 0000000000000000
Oct 1 20:15:52 sreenathv kernel: [ 3557.053382] RDX: 0000000000000000 RSI: 0000000000000246 RDI: ffffb9328123b2b8
Oct 1 20:15:52 sreenathv kernel: [ 3557.053416] RBP: ffffb9328c7a7c20 R08: 0000000000000000 R09: 0000000000000000
Oct 1 20:15:52 sreenathv kernel: [ 3557.053451] R10: 0000000000000000 R11: 0000000000000000 R12: ffffb9328c7a7bd8
Oct 1 20:15:52 sreenathv kernel: [ 3557.053485] R13: ffffb9328c7a7bb0 R14: ffffb9328123b2b8 R15: ffff92360c1f9638
Oct 1 20:15:52 sreenathv kernel: [ 3557.053519] FS: 00007f9d418b5740(0000) GS:ffff923756d80000(0000) knlGS:0000000000000000
Oct 1 20:15:52 sreenathv kernel: [ 3557.053558] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Oct 1 20:15:52 sreenathv kernel: [ 3557.053588] CR2: 0000000000000000 CR3: 0000000001f10005 CR4: 00000000003706e0
Oct 1 20:15:52 sreenathv kernel: [ 3557.053623] Call Trace:
Oct 1 20:15:52 sreenathv kernel: [ 3557.053640] <TASK>
Oct 1 20:15:52 sreenathv kernel: [ 3557.053667] ? show_regs+0x72/0x90
Oct 1 20:15:52 sreenathv kernel: [ 3557.053708] ? __die+0x25/0x80
Oct 1 20:15:52 sreenathv kernel: [ 3557.053745] ? page_fault_oops+0x79/0x190
Oct 1 20:15:52 sreenathv kernel: [ 3557.053786] ? __schedule+0x2bf/0x5f0
Oct 1 20:15:52 sreenathv kernel: [ 3557.053831] ? do_user_addr_fault+0x30c/0x640
Oct 1 20:15:52 sreenathv kernel: [ 3557.053870] ? exc_page_fault+0x81/0x1b0
Oct 1 20:15:52 sreenathv kernel: [ 3557.053897] ? asm_exc_page_fault+0x27/0x30
Oct 1 20:15:52 sreenathv kernel: [ 3557.053928] ? _raw_q_flush+0x95/0xf0 [nvidia_uvm]
Oct 1 20:15:52 sreenathv kernel: [ 3557.054034] ? __pfx__q_flush_function+0x10/0x10 [nvidia_uvm]
Oct 1 20:15:52 sreenathv kernel: [ 3557.054141] nv_kthread_q_flush+0x1a/0x80 [nvidia_uvm]
Oct 1 20:15:52 sreenathv kernel: [ 3557.054244] uvm_suspend+0xa6/0x1f0 [nvidia_uvm]
Oct 1 20:15:52 sreenathv kernel: [ 3557.054354] uvm_suspend_entry.part.0+0x79/0xa0 [nvidia_uvm]
Oct 1 20:15:52 sreenathv kernel: [ 3557.054470] uvm_suspend_entry+0x27/0x30 [nvidia_uvm]
Oct 1 20:15:52 sreenathv kernel: [ 3557.054577] nv_uvm_suspend+0x33/0x60 [nvidia]
Oct 1 20:15:52 sreenathv kernel: [ 3557.055300] nv_set_system_power_state+0x36f/0x430 [nvidia]
Oct 1 20:15:52 sreenathv kernel: [ 3557.055909] nv_procfs_write_suspend+0xe9/0x190 [nvidia]
Oct 1 20:15:52 sreenathv kernel: [ 3557.056522] proc_reg_write+0x69/0xa0
Oct 1 20:15:52 sreenathv kernel: [ 3557.056549] vfs_write+0xc9/0x3c0
Oct 1 20:15:52 sreenathv kernel: [ 3557.056575] ksys_write+0x67/0xf0
Oct 1 20:15:52 sreenathv kernel: [ 3557.056598] __x64_sys_write+0x19/0x30
Oct 1 20:15:52 sreenathv kernel: [ 3557.056622] do_syscall_64+0x5c/0x90
Oct 1 20:15:52 sreenathv kernel: [ 3557.056649] ? syscall_exit_to_user_mode+0x2a/0x50
Oct 1 20:15:52 sreenathv kernel: [ 3557.056679] ? do_syscall_64+0x69/0x90
Oct 1 20:15:52 sreenathv kernel: [ 3557.056705] ? irqentry_exit_to_user_mode+0x9/0x20
Oct 1 20:15:52 sreenathv kernel: [ 3557.056734] ? irqentry_exit+0x43/0x50
Oct 1 20:15:52 sreenathv kernel: [ 3557.056759] ? exc_page_fault+0x92/0x1b0
Oct 1 20:15:52 sreenathv kernel: [ 3557.056786] entry_SYSCALL_64_after_hwframe+0x72/0xdc
Oct 1 20:15:52 sreenathv kernel: [ 3557.057837] RIP: 0033:0x7f9d41714a37
Oct 1 20:15:52 sreenathv kernel: [ 3557.059508] Code: 10 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24
Oct 1 20:15:52 sreenathv kernel: [ 3557.062410] RSP: 002b:00007ffdcd4e84e8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
Oct 1 20:15:52 sreenathv kernel: [ 3557.063624] RAX: ffffffffffffffda RBX: 000000000000000a RCX: 00007f9d41714a37
Oct 1 20:15:52 sreenathv kernel: [ 3557.064813] RDX: 000000000000000a RSI: 000056004baf99e0 RDI: 0000000000000001
Oct 1 20:15:52 sreenathv kernel: [ 3557.065429] RBP: 000056004baf99e0 R08: 0000000000000000 R09: 000056004baf99e0
Oct 1 20:15:52 sreenathv kernel: [ 3557.066016] R10: 00007f9d41819cf0 R11: 0000000000000246 R12: 000000000000000a
Oct 1 20:15:52 sreenathv kernel: [ 3557.066571] R13: 00007f9d4181a780 R14: 00007f9d41816600 R15: 00007f9d41815a00
Oct 1 20:15:52 sreenathv kernel: [ 3557.067137] </TASK>
Oct 1 20:15:52 sreenathv kernel: [ 3557.067648] Modules linked in: rfcomm vmw_vsock_vmci_transport vsock vmw_vmci vboxnetadp(OE) xt_CHECKSUM vboxnetflt(OE) xt_MASQUERADE xt_conntrack vboxdrv(OE) ipt_REJECT nf_reject_ipv4 xt_tcpudp nft_compat nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables ccm libcrc32c nfnetlink bridge stp llc cmac algif_hash algif_skcipher af_alg bnep binfmt_misc nls_iso8859_1 uvcvideo snd_hda_codec_realtek snd_hda_codec_hdmi snd_hda_codec_generic snd_hda_intel x86_pkg_temp_thermal videobuf2_vmalloc snd_intel_dspcfg intel_powerclamp snd_intel_sdw_acpi videobuf2_memops coretemp snd_hda_codec videobuf2_v4l2 snd_hda_core rtsx_usb_ms kvm_intel ath3k btusb videodev kvm btrtl btbcm videobuf2_common snd_hwdep btintel nvidia_uvm(POE) intel_rapl_msr mei_hdcp mei_pxp irqbypass memstick mc snd_pcm btmtk crct10dif_pclmul i915 nvidia_drm(POE) snd_seq_midi snd_seq_midi_event nvidia_modeset(POE) bluetooth polyval_clmulni polyval_generic ghash_clmulni_intel snd_rawmidi ecdh_generic nvidia(POE)
Oct 1 20:15:52 sreenathv kernel: [ 3557.067719] snd_seq sha512_ssse3 ecc drm_buddy snd_seq_device ttm aesni_intel drm_display_helper snd_timer cec rc_core crypto_simd cryptd drm_kms_helper snd ath9k ath9k_common soundcore ath9k_hw ath processor_thermal_device_pci_legacy mac80211 processor_thermal_device processor_thermal_rfim rapl cfg80211 joydev i2c_algo_bit input_leds processor_thermal_mbox intel_cstate processor_thermal_rapl syscopyarea serio_raw mei_me spi_nor asus_nb_wmi sysfillrect mei sysimgblt intel_rapl_common libarc4 mtd mac_hid int340x_thermal_zone intel_soc_dts_iosf intel_pch_thermal asus_wireless int3400_thermal acpi_thermal_rel mxm_wmi acpi_pad sch_fq_codel msr parport_pc ppdev drm lp efi_pstore parport ip_tables x_tables autofs4 hid_generic rtsx_usb_sdmmc usbhid hid rtsx_usb mfd_aaeon asus_wmi ledtrig_audio spi_intel_platform sparse_keymap spi_intel platform_profile crc32_pclmul ahci lpc_ich libahci psmouse i2c_i801 r8169 i2c_smbus xhci_pci realtek xhci_pci_renesas video wmi
Oct 1 20:15:52 sreenathv kernel: [ 3557.075398] CR2: 0000000000000000
Oct 1 20:15:52 sreenathv kernel: [ 3557.076127] ---[ end trace 0000000000000000 ]---
Oct 1 20:15:52 sreenathv kernel: [ 3557.162235] RIP: 0010:_raw_q_flush+0x95/0xf0 [nvidia_uvm]
Oct 1 20:15:52 sreenathv kernel: [ 3557.163046] Code: 4c 89 64 24 18 e8 eb c9 e0 e2 48 89 c6 48 8b 04 24 4c 39 e8 75 56 48 8b 53 08 48 89 1c 24 4c 89 f7 48 89 43 08 48 89 54 24 08 <48> 89 02 e8 63 ca e0 e2 48 8d 7b 18 e8 ea 73 e0 e2 4c 89 e7 e8 62
Oct 1 20:15:52 sreenathv kernel: [ 3557.164651] RSP: 0018:ffffb9328c7a7bb0 EFLAGS: 00010046
Oct 1 20:15:52 sreenathv kernel: [ 3557.165534] RAX: ffffb9328c7a7bb0 RBX: ffffb9328123b2a8 RCX: 0000000000000000
Oct 1 20:15:52 sreenathv kernel: [ 3557.166358] RDX: 0000000000000000 RSI: 0000000000000246 RDI: ffffb9328123b2b8
Oct 1 20:15:52 sreenathv kernel: [ 3557.167190] RBP: ffffb9328c7a7c20 R08: 0000000000000000 R09: 0000000000000000
Oct 1 20:15:52 sreenathv kernel: [ 3557.168056] R10: 0000000000000000 R11: 0000000000000000 R12: ffffb9328c7a7bd8
Oct 1 20:15:52 sreenathv kernel: [ 3557.168881] R13: ffffb9328c7a7bb0 R14: ffffb9328123b2b8 R15: ffff92360c1f9638
Oct 1 20:15:52 sreenathv kernel: [ 3557.169710] FS: 00007f9d418b5740(0000) GS:ffff923756d80000(0000) knlGS:0000000000000000
Oct 1 20:15:52 sreenathv kernel: [ 3557.170548] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Oct 1 20:15:52 sreenathv kernel: [ 3557.171394] CR2: 0000000000000000 CR3: 0000000001f10005 CR4: 00000000003706e0
Oct 1 20:15:52 sreenathv kernel: [ 3557.172240] note: nvidia-sleep.sh[9278] exited with irqs disabled
Oct 1 20:15:52 sreenathv kernel: [ 3557.173163] note: nvidia-sleep.sh[9278] exited with preempt_count 1
Oct 1 20:15:52 sreenathv systemd[1]: nvidia-hibernate.service: Main process exited, code=killed, status=9/KILL
Oct 1 20:15:53 sreenathv /usr/libexec/gdm-x-session[2146]: (II) libinput: ETPS/2 Elantech Touchpad: SetProperty on 357 called but device is disabled.
Oct 1 20:15:54 sreenathv /usr/libexec/gdm-x-session[2146]: This driver cannot change properties on a disabled device
Oct 1 20:15:54 sreenathv systemd[1]: nvidia-hibernate.service: Failed with result 'signal'.
As expected, system becomes unresponsive after this. Kindly suggest a solution.