Bug report: 460.56 - Kernel NULL pointer dereference on suspend

Distro: Arch Linux
Kernel: 5.11.2-arch1-1
Driver version: 460.56
GPU: 01:00.0 VGA compatible controller [0300]: NVIDIA Corporation GP104 [GeForce GTX 1080] [10de:1b80] (rev a1)
After setting the following kernel module parameters, and enabling the suspend script as per the documentation (Chapter 21. Configuring Power Management Support), the system reliably crashes on suspend (systemctl suspend).
Parameters:

nvidia.NVreg_PreserveVideoMemoryAllocations=1 
nvidia.NVreg_TemporaryFilePath=/tmp-nvidia

Relevant dmesg output:

[   36.721335] WARNING: CPU: 3 PID: 2351 at /build/nvidia/src/nvidia/460.56/build/nvidia/nv.c:4079 nv_set_system_power_state+0x338/0x3c0 [nvidia]
[   36.721516] Modules linked in: xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xt_addrtype br_netfilter overlay uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videobuf2_common videodev xt_CHECKSUM ipt_REJECT nf_reject_ipv4 xt_tcpudp ip6table_mangle ip6table_nat ip6table_filter ip6_tables iptable_mangle iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c iptable_filter cfg80211 8021q garp mrp nls_iso8859_1 vfat fat snd_sof_pci snd_sof_intel_hda_common snd_hda_codec_realtek snd_soc_hdac_hda snd_hda_codec_generic snd_sof_intel_hda snd_sof_intel_byt snd_sof_intel_ipc snd_sof snd_sof_xtensa_dsp ledtrig_audio intel_rapl_msr snd_soc_skl iTCO_wdt eeepc_wmi intel_pmc_bxt ee1004 asus_wmi iTCO_vendor_support mei_hdcp sparse_keymap snd_soc_sst_ipc rfkill wmi_bmof intel_wmi_thunderbolt mxm_wmi snd_soc_sst_dsp intel_rapl_common snd_usb_audio snd_hda_ext_core uas snd_soc_acpi_intel_match snd_usbmidi_lib usb_storage snd_hda_codec_hdmi snd_soc_acpi snd_rawmidi
[   36.721545]  snd_hda_intel snd_seq_device mousedev joydev mc snd_intel_dspcfg x86_pkg_temp_thermal intel_powerclamp soundwire_intel coretemp soundwire_generic_allocation kvm_intel soundwire_cadence snd_hda_codec snd_hda_core snd_hwdep kvm soundwire_bus irqbypass crct10dif_pclmul snd_soc_core crc32_pclmul ghash_clmulni_intel aesni_intel crypto_simd cryptd glue_helper snd_compress rapl ac97_bus intel_cstate snd_pcm_dmaengine intel_uncore bridge snd_pcm i2c_i801 stp pcspkr e1000e i915 i2c_smbus snd_timer llc snd wireguard mei_me i2c_algo_bit soundcore intel_gtt mei curve25519_x86_64 libchacha20poly1305 chacha_x86_64 poly1305_x86_64 libblake2s blake2s_x86_64 ip6_udp_tunnel udp_tunnel libcurve25519_generic libchacha wmi libblake2s_generic video mac_hid acpi_tad acpi_pad nfsd auth_rpcgss nfs_acl lockd grace virtio_balloon virtio_scsi virtio_blk sunrpc virtio_net net_failover failover sg crypto_user fuse nfs_ssc bpf_preload ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd2 usbhid
[   36.721579]  raid10 dm_mod raid0 md_mod crc32c_intel xhci_pci xhci_pci_renesas i2c_dev nvidia_uvm(POE) nvidia_drm(POE) nvidia_modeset(POE) nvidia(POE) drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops cec drm agpgart
[   36.721588] CPU: 3 PID: 2351 Comm: nvidia-sleep.sh Tainted: P           OE     5.11.2-arch1-1 #1
[   36.721590] Hardware name: System manufacturer System Product Name/PRIME Z390-A, BIOS 1401 11/26/2019
[   36.721590] RIP: 0010:nv_set_system_power_state+0x338/0x3c0 [nvidia]
[   36.721697] Code: 05 00 00 48 85 ed 0f 84 82 00 00 00 48 8b 85 60 02 00 00 ba 01 00 00 00 89 de 48 8b 78 78 e8 5f d4 ff ff 41 89 c4 85 c0 74 d2 <0f> 0b 48 c7 c7 b0 1d 2a c2 41 bd 01 00 00 00 e8 24 36 b0 ea e9 4a
[   36.721699] RSP: 0018:ffffacf2c2617e58 EFLAGS: 00010206
[   36.721700] RAX: 000000000000ffff RBX: 0000000000000001 RCX: 000000000058ca03
[   36.721701] RDX: ffff92994ed4a500 RSI: 0000000000000282 RDI: 0000000000000282
[   36.721702] RBP: ffff92994ed4a000 R08: 0000000000000003 R09: 0000000000000000
[   36.721703] R10: 0000000000000019 R11: 0000000000000000 R12: 000000000000ffff
[   36.721703] R13: 0000000000000000 R14: 0000000000000000 R15: ffff92994ed4a4f8
[   36.721704] FS:  00007fb43b217b80(0000) GS:ffff92a09dcc0000(0000) knlGS:0000000000000000
[   36.721705] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   36.721706] CR2: 00007fb43b44c5b0 CR3: 0000000214518005 CR4: 00000000003706e0
[   36.721707] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   36.721708] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   36.721709] Call Trace:
[   36.721711]  nv_procfs_write_suspend+0x101/0x150 [nvidia]
[   36.721818]  proc_reg_write+0x51/0x90
[   36.721821]  vfs_write+0xc2/0x2a0
[   36.721823]  ksys_write+0x67/0xe0
[   36.721824]  do_syscall_64+0x33/0x40
[   36.721827]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[   36.721829] RIP: 0033:0x7fb43b37a0f7
[   36.721830] Code: 0d 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24
[   36.721831] RSP: 002b:00007ffe609d3cd8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[   36.721833] RAX: ffffffffffffffda RBX: 0000000000000008 RCX: 00007fb43b37a0f7
[   36.721833] RDX: 0000000000000008 RSI: 000055c6fef53810 RDI: 0000000000000001
[   36.721834] RBP: 000055c6fef53810 R08: 000000000000000a R09: 00007fb43b44ba60
[   36.721835] R10: 0000000000000004 R11: 0000000000000246 R12: 0000000000000008
[   36.721835] R13: 00007fb43b44c520 R14: 0000000000000008 R15: 00007fb43b44c700
[   36.721837] ---[ end trace 4e34069d53a86820 ]---

and

[   45.337390] BUG: kernel NULL pointer dereference, address: 0000000000000050
[   45.337396] #PF: supervisor read access in kernel mode
[   45.337398] #PF: error_code(0x0000) - not-present page
[   45.337399] PGD 0 P4D 0 
[   45.337401] Oops: 0000 [#1] PREEMPT SMP NOPTI
[   45.337403] CPU: 3 PID: 193 Comm: nvidia-modeset/ Tainted: P        W  OE     5.11.2-arch1-1 #1
[   45.337406] Hardware name: System manufacturer System Product Name/PRIME Z390-A, BIOS 1401 11/26/2019
[   45.337408] RIP: 0010:_nv000112kms+0xd/0x30 [nvidia_modeset]
[   45.337423] Code: 16 00 74 06 83 7e 0c 02 77 03 31 c0 c3 c7 46 0c 01 00 00 00 b8 01 00 00 00 c3 0f 1f 00 0f b7 46 04 0f b7 56 08 39 d0 0f 47 c2 <3b> 47 18 77 06 83 7e 10 02 77 08 31 c0 c3 0f 1f 44 00 00 c7 46 10
[   45.337426] RSP: 0018:ffffacf2c0e9f4b8 EFLAGS: 00010246
[   45.337428] RAX: 0000000000000a00 RBX: 0000000000000038 RCX: 00000000000003ff
[   45.337429] RDX: 0000000000000a00 RSI: ffffacf2c0e9f7b0 RDI: 0000000000000038
[   45.337431] RBP: ffffacf2c0e9f5d8 R08: 0000000000000000 R09: ffffffffc22a5260
[   45.337432] R10: 0000000000000000 R11: 00000000ffffffff R12: ffffacf2c0e9f558
[   45.337434] R13: ffffacf2c0e9f558 R14: ffffacf2c0e9f540 R15: ffffffffc26ebda0
[   45.337436] FS:  0000000000000000(0000) GS:ffff92a09dcc0000(0000) knlGS:0000000000000000
[   45.337437] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   45.337439] CR2: 0000000000000050 CR3: 0000000110fb4003 CR4: 00000000003706e0
[   45.337441] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   45.337442] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   45.337444] Call Trace:
[   45.337446]  ? _nv002768kms+0x96/0xd0 [nvidia_modeset]
[   45.337458]  ? _nv002328kms+0xb5/0x110 [nvidia_modeset]
[   45.337470]  ? _nv000742kms+0x168/0x370 [nvidia_modeset]
[   45.337486]  ? _nv002771kms+0x280/0x600 [nvidia_modeset]
[   45.337500]  ? _nv002771kms+0x252/0x600 [nvidia_modeset]
[   45.337514]  ? _nv000742kms+0x40/0x40 [nvidia_modeset]
[   45.337523]  ? _nv000744kms+0x2a/0x40 [nvidia_modeset]
[   45.337532]  ? nvKmsIoctl+0x96/0x1d0 [nvidia_modeset]
[   45.337542]  ? nvkms_ioctl_from_kapi+0x40/0x80 [nvidia_modeset]
[   45.337552]  ? _nv000401kms+0x75/0x200 [nvidia_modeset]
[   45.337573]  ? nv_drm_connector_get_modes+0xd4/0x150 [nvidia_drm]
[   45.337577]  ? drm_helper_probe_single_connector_modes+0x1ca/0x810 [drm_kms_helper]
[   45.337587]  ? nv_drm_output_poll_changed+0x85/0xd0 [nvidia_drm]
[   45.337590]  ? drm_kms_helper_hotplug_event+0x26/0x30 [drm_kms_helper]
[   45.337597]  ? nv_drm_event_callback+0x4d/0x90 [nvidia_drm]
[   45.337600]  ? nvKmsKapiHandleEventQueueChange+0xc7/0x100 [nvidia_modeset]
[   45.337619]  ? _main_loop+0x83/0x130 [nvidia_modeset]
[   45.337628]  ? nvkms_sema_up+0x10/0x10 [nvidia_modeset]
[   45.337637]  ? kthread+0x133/0x150
[   45.337640]  ? __kthread_bind_mask+0x60/0x60
[   45.337643]  ? ret_from_fork+0x22/0x30
[   45.337646] Modules linked in: xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xt_addrtype br_netfilter overlay uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videobuf2_common videodev xt_CHECKSUM ipt_REJECT nf_reject_ipv4 xt_tcpudp ip6table_mangle ip6table_nat ip6table_filter ip6_tables iptable_mangle iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c iptable_filter cfg80211 8021q garp mrp nls_iso8859_1 vfat fat snd_sof_pci snd_sof_intel_hda_common snd_hda_codec_realtek snd_soc_hdac_hda snd_hda_codec_generic snd_sof_intel_hda snd_sof_intel_byt snd_sof_intel_ipc snd_sof snd_sof_xtensa_dsp ledtrig_audio intel_rapl_msr snd_soc_skl iTCO_wdt eeepc_wmi intel_pmc_bxt ee1004 asus_wmi iTCO_vendor_support mei_hdcp sparse_keymap snd_soc_sst_ipc rfkill wmi_bmof intel_wmi_thunderbolt mxm_wmi snd_soc_sst_dsp intel_rapl_common snd_usb_audio snd_hda_ext_core uas snd_soc_acpi_intel_match snd_usbmidi_lib usb_storage snd_hda_codec_hdmi snd_soc_acpi snd_rawmidi
[   45.337677]  snd_hda_intel snd_seq_device mousedev joydev mc snd_intel_dspcfg x86_pkg_temp_thermal intel_powerclamp soundwire_intel coretemp soundwire_generic_allocation kvm_intel soundwire_cadence snd_hda_codec snd_hda_core snd_hwdep kvm soundwire_bus irqbypass crct10dif_pclmul snd_soc_core crc32_pclmul ghash_clmulni_intel aesni_intel crypto_simd cryptd glue_helper snd_compress rapl ac97_bus intel_cstate snd_pcm_dmaengine intel_uncore bridge snd_pcm i2c_i801 stp pcspkr e1000e i915 i2c_smbus snd_timer llc snd wireguard mei_me i2c_algo_bit soundcore intel_gtt mei curve25519_x86_64 libchacha20poly1305 chacha_x86_64 poly1305_x86_64 libblake2s blake2s_x86_64 ip6_udp_tunnel udp_tunnel libcurve25519_generic libchacha wmi libblake2s_generic video mac_hid acpi_tad acpi_pad nfsd auth_rpcgss nfs_acl lockd grace virtio_balloon virtio_scsi virtio_blk sunrpc virtio_net net_failover failover sg crypto_user fuse nfs_ssc bpf_preload ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd2 usbhid
[   45.337720]  raid10 dm_mod raid0 md_mod crc32c_intel xhci_pci xhci_pci_renesas i2c_dev nvidia_uvm(POE) nvidia_drm(POE) nvidia_modeset(POE) nvidia(POE) drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops cec drm agpgart
[   45.337738] CR2: 0000000000000050
[   45.337740] ---[ end trace 4e34069d53a86821 ]---
[   45.433453] RIP: 0010:_nv000112kms+0xd/0x30 [nvidia_modeset]
[   45.433466] Code: 16 00 74 06 83 7e 0c 02 77 03 31 c0 c3 c7 46 0c 01 00 00 00 b8 01 00 00 00 c3 0f 1f 00 0f b7 46 04 0f b7 56 08 39 d0 0f 47 c2 <3b> 47 18 77 06 83 7e 10 02 77 08 31 c0 c3 0f 1f 44 00 00 c7 46 10
[   45.433469] RSP: 0018:ffffacf2c0e9f4b8 EFLAGS: 00010246
[   45.433471] RAX: 0000000000000a00 RBX: 0000000000000038 RCX: 00000000000003ff
[   45.433472] RDX: 0000000000000a00 RSI: ffffacf2c0e9f7b0 RDI: 0000000000000038
[   45.433474] RBP: ffffacf2c0e9f5d8 R08: 0000000000000000 R09: ffffffffc22a5260
[   45.433475] R10: 0000000000000000 R11: 00000000ffffffff R12: ffffacf2c0e9f558
[   45.433477] R13: ffffacf2c0e9f558 R14: ffffacf2c0e9f540 R15: ffffffffc26ebda0
[   45.433478] FS:  0000000000000000(0000) GS:ffff92a09dcc0000(0000) knlGS:0000000000000000
[   45.433480] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   45.433482] CR2: 0000000000000050 CR3: 0000000110fb4003 CR4: 00000000003706e0
[   45.433484] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   45.433485] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400

An (incomplete) result of running nvidia-bug-report.sh as well as complete result of running nvidia-bug-report.sh --safe-mode --extra-system-data is attached.

nvidia-bug-report.log.incomplete.gz (102.0 KB)
nvidia-bug-report.log.gz (100.2 KB)

Please do let me know if there is any additional information I can provide. I can reproduce this issue reliably by loading the nvidia driver with the preserve memory parameters.

Does that path /tmp-nvidia exist?

Edit: from log
/usr/bin/nvidia-sleep.sh: line 20: echo: write error: Input/output error

1 Like

Ah, I misinterpreted the documentation. Thank you for the clarification. I verified and that is indeed the culprit.