BUG Report: kernel NULL pointer dereference on NVIDIA 465.24.02-2 on Linux 5.11.15 when starting X11

When starting X11, a NULL pointer access happens in the nvidia driver stack, causes whole machine to freeze.

Only workaround I found was to downgrade to 460.67 on kernel 5.11.13

There is no way to get nvidia-bug-report to run because the system becomes unresponsive and requires hard reset. I attached the nvidia-bug-report.sh result on the downgraded driver + kernel if it helps at all.nvidia-bug-report.log.gz (310.0 KB)

Below follows the kernel logs.

Apr 20 20:49:39 scout kernel: audit: type=1105 audit(1618976979.893:84): pid=474 uid=0 auid=1000 ses=3 msg='op=PAM:session_open grantors=pam_loginuid,pam>
Apr 20 20:49:39 scout kernel: audit: type=1110 audit(1618976979.893:85): pid=474 uid=0 auid=1000 ses=3 msg='op=PAM:setcred grantors=pam_securetty,pam_she>
Apr 20 20:49:40 scout kernel: resource sanity check: requesting [mem 0x000c0000-0x000fffff], which spans more than PCI Bus 0000:00 [mem 0x000d8000-0x000d>
Apr 20 20:49:40 scout kernel: caller _nv000712rm+0x1af/0x200 [nvidia] mapping multiple BARs
Apr 20 20:49:44 scout kernel: BUG: kernel NULL pointer dereference, address: 0000000000000170
Apr 20 20:49:44 scout kernel: #PF: supervisor read access in kernel mode
Apr 20 20:49:44 scout kernel: #PF: error_code(0x0000) - not-present page
Apr 20 20:49:44 scout kernel: PGD 0 P4D 0 
Apr 20 20:49:44 scout kernel: Oops: 0000 [#1] PREEMPT SMP PTI
Apr 20 20:49:44 scout kernel: CPU: 3 PID: 570 Comm: Xorg Tainted: P           OE     5.11.15-arch1-2 #1
Apr 20 20:49:44 scout kernel: Hardware name: System manufacturer System Product Name/MAXIMUS V GENE, BIOS 0701 03/29/2012
Apr 20 20:49:44 scout kernel: RIP: 0010:_nv015534rm+0x1b6/0x330 [nvidia]
Apr 20 20:49:44 scout kernel: Code: 8b 87 68 05 00 00 ba 01 00 00 00 be 02 00 00 00 e8 cf 40 eb cb 41 83 c5 01 41 83 fd 1f 0f 84 0b 01 00 00 48 8b 45 10 >
Apr 20 20:49:44 scout kernel: RSP: 0018:ffffb5da80c779a0 EFLAGS: 00010293
Apr 20 20:49:44 scout kernel: RAX: 0000000000000000 RBX: 0000000000000400 RCX: 0000000000000007
Apr 20 20:49:44 scout kernel: RDX: 0000000000000004 RSI: 0000000000000008 RDI: 0000000000000000
Apr 20 20:49:44 scout kernel: RBP: ffff9f99125fadd0 R08: 0000000000000001 R09: ffff9f99125facb8
Apr 20 20:49:44 scout kernel: R10: ffff9f990dff8008 R11: 0000000010100000 R12: 0000000000005400
Apr 20 20:49:44 scout kernel: R13: 0000000000000008 R14: ffff9f99124ec010 R15: 0000000000000800
Apr 20 20:49:44 scout kernel: FS:  00007f55dd4ad940(0000) GS:ffff9f9c0ecc0000(0000) knlGS:0000000000000000
Apr 20 20:49:44 scout kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Apr 20 20:49:44 scout kernel: CR2: 0000000000000170 CR3: 0000000112500006 CR4: 00000000001706e0
Apr 20 20:49:44 scout kernel: Call Trace:
Apr 20 20:49:44 scout kernel:  ? _nv015556rm+0x7fd/0x1020 [nvidia]
Apr 20 20:49:44 scout kernel:  ? _nv027154rm+0x22c/0x4f0 [nvidia]
Apr 20 20:49:44 scout kernel:  ? _nv017786rm+0x303/0x5e0 [nvidia]
Apr 20 20:49:44 scout kernel:  ? _nv017788rm+0xe1/0x220 [nvidia]
Apr 20 20:49:44 scout kernel:  ? _nv022828rm+0xed/0x220 [nvidia]
Apr 20 20:49:44 scout kernel:  ? _nv023064rm+0x30/0x60 [nvidia]
Apr 20 20:49:44 scout kernel:  ? _nv000704rm+0x16da/0x22b0 [nvidia]
Apr 20 20:49:44 scout kernel:  ? rm_init_adapter+0xc5/0xe0 [nvidia]
Apr 20 20:49:44 scout kernel:  ? kthread_create_on_node+0x51/0x70
Apr 20 20:49:44 scout kernel:  ? nv_open_device+0x122/0x8a0 [nvidia]
Apr 20 20:49:44 scout kernel:  ? nvidia_open+0x297/0x540 [nvidia]
Apr 20 20:49:44 scout kernel:  ? kobj_lookup+0xf0/0x160
Apr 20 20:49:44 scout kernel:  ? nvidia_frontend_open+0x53/0xa0 [nvidia]
Apr 20 20:49:44 scout kernel:  ? chrdev_open+0xca/0x240
Apr 20 20:49:44 scout kernel:  ? cdev_device_add+0x90/0x90
Apr 20 20:49:44 scout kernel:  ? do_dentry_open+0x14e/0x380
Apr 20 20:49:44 scout kernel:  ? path_openat+0xb67/0x1010
Apr 20 20:49:44 scout kernel:  ? do_filp_open+0x9c/0x140
Apr 20 20:49:44 scout kernel:  ? do_sys_openat2+0xb1/0x160
Apr 20 20:49:44 scout kernel:  ? __x64_sys_openat+0x54/0x90
Apr 20 20:49:44 scout kernel:  ? do_syscall_64+0x33/0x40
Apr 20 20:49:44 scout kernel:  ? entry_SYSCALL_64_after_hwframe+0x44/0xa9
Apr 20 20:49:44 scout kernel: Modules linked in: nvidia_drm(POE) nvidia_modeset(POE) nvidia(POE) intel_rapl_msr intel_rapl_common nct6775 x86_pkg_temp_th>
Apr 20 20:49:44 scout kernel:  sysfillrect joydev sysimgblt ecc mc mousedev lpc_ich snd_timer fb_sys_fops e1000e snd mei_me soundcore mei video wmi mac_h>
Apr 20 20:49:44 scout kernel: CR2: 0000000000000170
Apr 20 20:49:44 scout kernel: ---[ end trace 29eca02abdf6b07a ]---
Apr 20 20:49:44 scout kernel: RIP: 0010:_nv015534rm+0x1b6/0x330 [nvidia]
Apr 20 20:49:44 scout kernel: Code: 8b 87 68 05 00 00 ba 01 00 00 00 be 02 00 00 00 e8 cf 40 eb cb 41 83 c5 01 41 83 fd 1f 0f 84 0b 01 00 00 48 8b 45 10 >
Apr 20 20:49:44 scout kernel: RSP: 0018:ffffb5da80c779a0 EFLAGS: 00010293
Apr 20 20:49:44 scout kernel: RAX: 0000000000000000 RBX: 0000000000000400 RCX: 0000000000000007
Apr 20 20:49:44 scout kernel: RDX: 0000000000000004 RSI: 0000000000000008 RDI: 0000000000000000
Apr 20 20:49:44 scout kernel: RBP: ffff9f99125fadd0 R08: 0000000000000001 R09: ffff9f99125facb8
Apr 20 20:49:44 scout kernel: R10: ffff9f990dff8008 R11: 0000000010100000 R12: 0000000000005400
Apr 20 20:49:44 scout kernel: R13: 0000000000000008 R14: ffff9f99124ec010 R15: 0000000000000800
Apr 20 20:49:44 scout kernel: FS:  00007f55dd4ad940(0000) GS:ffff9f9c0ecc0000(0000) knlGS:0000000000000000
Apr 20 20:49:44 scout kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Apr 20 20:49:44 scout kernel: CR2: 0000000000000170 CR3: 0000000112500006 CR4: 00000000001706e0
Apr 20 20:49:44 scout kernel: general protection fault, probably for non-canonical address 0xe8a7f4b32b56f200: 0000 [#2] PREEMPT SMP PTI
Apr 20 20:49:44 scout kernel: CPU: 3 PID: 570 Comm: Xorg Tainted: P      D    OE     5.11.15-arch1-2 #1
Apr 20 20:49:44 scout kernel: Hardware name: System manufacturer System Product Name/MAXIMUS V GENE, BIOS 0701 03/29/2012
Apr 20 20:49:44 scout kernel: RIP: 0010:_nv009368rm+0x3c/0x340 [nvidia]
Apr 20 20:49:44 scout kernel: Code: 07 0f 1f 44 00 00 31 d2 48 8b 07 48 85 c0 75 1a e9 a1 02 00 00 66 0f 1f 84 00 00 00 00 00 48 8b 48 10 48 85 c9 74 17 >
Apr 20 20:49:44 scout kernel: RSP: 0018:ffffb5da80c77d38 EFLAGS: 00010086
Apr 20 20:49:44 scout kernel: RAX: e8a7f4b32b56f200 RBX: ffffb5da80c77d80 RCX: e8a7f4b32b56f200
Apr 20 20:49:44 scout kernel: RDX: ffffb5da80c77dd0 RSI: 000000000000023a RDI: ffffffffc2c50998
Apr 20 20:49:44 scout kernel: RBP: ffff9f990dfc5ff0 R08: 0000000000000001 R09: 0000000000000000
Apr 20 20:49:44 scout kernel: R10: 0000000000000001 R11: ffffffff8db96200 R12: 0000000000000000
Apr 20 20:49:44 scout kernel: R13: ffffffffc2c51180 R14: ffff9f99121b6000 R15: ffffffffc2c4ddc0
Apr 20 20:49:44 scout kernel: FS:  0000000000000000(0000) GS:ffff9f9c0ecc0000(0000) knlGS:0000000000000000
Apr 20 20:49:44 scout kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Apr 20 20:49:44 scout kernel: CR2: 0000000000000170 CR3: 0000000168210005 CR4: 00000000001706e0
Apr 20 20:49:44 scout kernel: Call Trace:
Apr 20 20:49:44 scout kernel:  ? _nv039612rm+0xdf/0x1e0 [nvidia]
Apr 20 20:49:44 scout kernel:  ? rm_cleanup_file_private+0x42/0x140 [nvidia]
Apr 20 20:49:44 scout kernel:  ? nv_acpi_uninit+0x30/0xe0 [nvidia]
Apr 20 20:49:44 scout kernel:  ? nvidia_close+0x14b/0x300 [nvidia]
Apr 20 20:49:44 scout kernel:  ? nvidia_frontend_close+0x2b/0x50 [nvidia]
Apr 20 20:49:44 scout kernel:  ? __fput+0x85/0x230
Apr 20 20:49:44 scout kernel:  ? task_work_run+0x5c/0x90
Apr 20 20:49:44 scout kernel:  ? do_exit+0x37f/0xa60
Apr 20 20:49:44 scout kernel:  ? do_sys_openat2+0xb1/0x160
Apr 20 20:49:44 scout kernel:  ? rewind_stack_do_exit+0x17/0x17
Apr 20 20:49:44 scout kernel: Modules linked in: nvidia_drm(POE) nvidia_modeset(POE) nvidia(POE) intel_rapl_msr intel_rapl_common nct6775 x86_pkg_temp_th>
Apr 20 20:49:44 scout kernel:  sysfillrect joydev sysimgblt ecc mc mousedev lpc_ich snd_timer fb_sys_fops e1000e snd mei_me soundcore mei video wmi mac_h>
Apr 20 20:49:44 scout kernel: ---[ end trace 29eca02abdf6b07b ]---
Apr 20 20:49:44 scout kernel: RIP: 0010:_nv015534rm+0x1b6/0x330 [nvidia]
Apr 20 20:49:44 scout kernel: Code: 8b 87 68 05 00 00 ba 01 00 00 00 be 02 00 00 00 e8 cf 40 eb cb 41 83 c5 01 41 83 fd 1f 0f 84 0b 01 00 00 48 8b 45 10 >
Apr 20 20:49:44 scout kernel: RSP: 0018:ffffb5da80c779a0 EFLAGS: 00010293
Apr 20 20:49:44 scout kernel: RAX: 0000000000000000 RBX: 0000000000000400 RCX: 0000000000000007
Apr 20 20:49:44 scout kernel: RDX: 0000000000000004 RSI: 0000000000000008 RDI: 0000000000000000
Apr 20 20:49:44 scout kernel: RBP: ffff9f99125fadd0 R08: 0000000000000001 R09: ffff9f99125facb8
Apr 20 20:49:44 scout kernel: R10: ffff9f990dff8008 R11: 0000000010100000 R12: 0000000000005400
Apr 20 20:49:44 scout kernel: R13: 0000000000000008 R14: ffff9f99124ec010 R15: 0000000000000800
Apr 20 20:49:44 scout kernel: FS:  0000000000000000(0000) GS:ffff9f9c0ecc0000(0000) knlGS:0000000000000000
Apr 20 20:49:44 scout kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Apr 20 20:49:44 scout kernel: CR2: 0000000000000170 CR3: 0000000168210005 CR4: 00000000001706e0
Apr 20 20:49:44 scout kernel: note: Xorg[570] exited with preempt_count 1
Apr 20 20:49:44 scout kernel: Fixing recursive fault but reboot is needed!

I think this is the same as 465.24.02 page fault