Bug report: 455.23.04 - Kernel Panic due to NULL pointer dereference

@angrylinuxuser All the crashes were with the VDPAU enabled. I’ve disabled VDPAU acceleration in Kodi and so far I have observed no crashes. I will try to run this for a longer period of time and see how it goes.

I have been running kodi in full screen mode on secondary monitor since a week but have not hit with issue.
Also tried running you tube videos on chrome browser for couple of days but no luck…

I have the same issue as everyone.

Here are my logs :

nov. 23 15:21:44 par-pf1pnv7r kernel: BUG: kernel NULL pointer dereference, address: 0000000000000020
nov. 23 15:21:44 par-pf1pnv7r kernel: #PF: supervisor read access in kernel mode
nov. 23 15:21:44 par-pf1pnv7r kernel: #PF: error_code(0x0000) - not-present page
nov. 23 15:21:44 par-pf1pnv7r kernel: PGD 800000037d514067 P4D 800000037d514067 PUD 0
nov. 23 15:21:44 par-pf1pnv7r kernel: Oops: 0000 [#1] PREEMPT SMP PTI
nov. 23 15:21:44 par-pf1pnv7r kernel: CPU: 2 PID: 1084 Comm: irq/169-nvidia Tainted: P           OE     5.9.10-1-MANJARO #1
nov. 23 15:21:44 par-pf1pnv7r kernel: Hardware name: LENOVO 20KHS21N00/20KHS21N00, BIOS N23ET65W (1.40 ) 07/02/2019
nov. 23 15:21:44 par-pf1pnv7r kernel: RIP: 0010:_nv027527rm+0x9/0x90 [nvidia]
nov. 23 15:21:44 par-pf1pnv7r kernel: Code: 90 ff e8 ea b0 00 00 31 c0 48 83 c4 08 c3 31 c0 eb bf 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 83 ec 08 48 85 ff 74 57 <48> 8b 17 31 c0 48 85 d2 75 0e eb 2b 0f 1f 00 48 8b 52 10 48 85 d2
nov. 23 15:21:44 par-pf1pnv7r kernel: RSP: 0018:ffffb46ec0953c00 EFLAGS: 00010202
nov. 23 15:21:44 par-pf1pnv7r kernel: RAX: 0000000000000020 RBX: 0000000000000020 RCX: 0000000000000010
nov. 23 15:21:44 par-pf1pnv7r kernel: RDX: ffff967adf2abbc8 RSI: ffffffffffffffff RDI: 0000000000000020
nov. 23 15:21:44 par-pf1pnv7r kernel: RBP: ffff967e1d2ca9d0 R08: ffffffffc3b5f650 R09: ffff967e1d2ca9b0
nov. 23 15:21:44 par-pf1pnv7r kernel: R10: ffffffffc27aa820 R11: ffff967e22864808 R12: 0000000000000020
nov. 23 15:21:44 par-pf1pnv7r kernel: R13: 0000000000000000 R14: ffff967e1d2cab38 R15: ffff967e1d2cac78
nov. 23 15:21:44 par-pf1pnv7r kernel: FS:  0000000000000000(0000) GS:ffff967e52680000(0000) knlGS:0000000000000000
nov. 23 15:21:44 par-pf1pnv7r kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
nov. 23 15:21:44 par-pf1pnv7r kernel: CR2: 0000000000000020 CR3: 000000037a0ee001 CR4: 00000000003706e0
nov. 23 15:21:44 par-pf1pnv7r kernel: Call Trace:
nov. 23 15:21:44 par-pf1pnv7r kernel:  ? _nv029950rm+0x1b/0x90 [nvidia]
nov. 23 15:21:44 par-pf1pnv7r kernel:  ? _nv025474rm+0x18/0x60 [nvidia]
nov. 23 15:21:44 par-pf1pnv7r kernel:  ? _nv011691rm+0x13d/0x1c0 [nvidia]
nov. 23 15:21:44 par-pf1pnv7r kernel:  ? _nv000083rm+0x12f/0x1a0 [nvidia]
nov. 23 15:21:44 par-pf1pnv7r kernel:  ? _nv036719rm+0xc3/0x350 [nvidia]
nov. 23 15:21:44 par-pf1pnv7r kernel:  ? _nv036718rm+0x5c/0x70 [nvidia]
nov. 23 15:21:44 par-pf1pnv7r kernel:  ? _nv011615rm+0x78/0xd0 [nvidia]
nov. 23 15:21:44 par-pf1pnv7r kernel:  ? _nv011615rm+0x1a/0xd0 [nvidia]
nov. 23 15:21:44 par-pf1pnv7r kernel:  ? _nv024757rm+0x251/0x3e0 [nvidia]
nov. 23 15:21:44 par-pf1pnv7r kernel:  ? _nv024706rm+0x25/0x150 [nvidia]
nov. 23 15:21:44 par-pf1pnv7r kernel:  ? _nv015453rm+0x9b/0x270 [nvidia]
nov. 23 15:21:44 par-pf1pnv7r kernel:  ? _nv026077rm+0x290/0x290 [nvidia]
nov. 23 15:21:44 par-pf1pnv7r kernel:  ? _nv027734rm+0x273/0xdc0 [nvidia]
nov. 23 15:21:44 par-pf1pnv7r kernel:  ? _nv007566rm+0x155/0x270 [nvidia]
nov. 23 15:21:44 par-pf1pnv7r kernel:  ? _nv027742rm+0x8d/0x180 [nvidia]
nov. 23 15:21:44 par-pf1pnv7r kernel:  ? _nv000712rm+0xa9/0x200 [nvidia]
nov. 23 15:21:44 par-pf1pnv7r kernel:  ? disable_irq_nosync+0x10/0x10
nov. 23 15:21:44 par-pf1pnv7r kernel:  ? rm_isr_bh+0x1c/0x60 [nvidia]
nov. 23 15:21:44 par-pf1pnv7r kernel:  ? nvidia_isr_kthread_bh+0x1b/0x40 [nvidia]
nov. 23 15:21:44 par-pf1pnv7r kernel:  ? irq_thread_fn+0x20/0x60
nov. 23 15:21:44 par-pf1pnv7r kernel:  ? irq_thread+0xf5/0x1a0
nov. 23 15:21:44 par-pf1pnv7r kernel:  ? irq_finalize_oneshot.part.0+0xe0/0xe0
nov. 23 15:21:44 par-pf1pnv7r kernel:  ? irq_thread_check_affinity+0xd0/0xd0
nov. 23 15:21:44 par-pf1pnv7r kernel:  ? kthread+0x142/0x160
nov. 23 15:21:44 par-pf1pnv7r kernel:  ? __kthread_bind_mask+0x60/0x60
nov. 23 15:21:44 par-pf1pnv7r kernel:  ? ret_from_fork+0x22/0x30
nov. 23 15:21:44 par-pf1pnv7r kernel: Modules linked in: snd_seq_dummy snd_hrtimer snd_seq bnep xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo xt_addrtype iptable_filter iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 br_netfilt>
nov. 23 15:21:44 par-pf1pnv7r kernel:  intel_wmi_thunderbolt ac97_bus wmi_bmof snd_pcm_dmaengine kvm_intel snd_hda_intel nls_iso8859_1 snd_intel_dspcfg nls_cp437 iwlmvm kvm vfat irqbypass crct10dif_pclmul fat i2c_algo_bit snd_hda_codec mac80211 crc32_pclmul ghash_clmulni_i>
nov. 23 15:21:44 par-pf1pnv7r kernel:  x_tables btrfs blake2b_generic libcrc32c crc32c_generic xor raid6_pq serio_raw atkbd libps2 xhci_pci xhci_hcd i8042 serio crc32c_intel
nov. 23 15:21:44 par-pf1pnv7r kernel: CR2: 0000000000000020
nov. 23 15:21:44 par-pf1pnv7r kernel:  ? _nv027742rm+0x8d/0x180 [nvidia]
nov. 23 15:21:44 par-pf1pnv7r kernel:  ? _nv000712rm+0xa9/0x200 [nvidia]
nov. 23 15:21:44 par-pf1pnv7r kernel:  ? disable_irq_nosync+0x10/0x10
nov. 23 15:21:44 par-pf1pnv7r kernel:  ? rm_isr_bh+0x1c/0x60 [nvidia]
nov. 23 15:21:44 par-pf1pnv7r kernel:  ? nvidia_isr_kthread_bh+0x1b/0x40 [nvidia]
nov. 23 15:21:44 par-pf1pnv7r kernel:  ? irq_thread_fn+0x20/0x60
nov. 23 15:21:44 par-pf1pnv7r kernel:  ? irq_thread+0xf5/0x1a0
nov. 23 15:21:44 par-pf1pnv7r kernel:  ? irq_finalize_oneshot.part.0+0xe0/0xe0
nov. 23 15:21:44 par-pf1pnv7r kernel:  ? irq_thread_check_affinity+0xd0/0xd0
nov. 23 15:21:44 par-pf1pnv7r kernel:  ? kthread+0x142/0x160
nov. 23 15:21:44 par-pf1pnv7r kernel:  ? __kthread_bind_mask+0x60/0x60
nov. 23 15:21:44 par-pf1pnv7r kernel:  ? ret_from_fork+0x22/0x30
nov. 23 15:21:44 par-pf1pnv7r kernel: Modules linked in: snd_seq_dummy snd_hrtimer snd_seq bnep xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo xt_addrtype iptable_filter iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 br_netfilt>
nov. 23 15:21:44 par-pf1pnv7r kernel:  intel_wmi_thunderbolt ac97_bus wmi_bmof snd_pcm_dmaengine kvm_intel snd_hda_intel nls_iso8859_1 snd_intel_dspcfg nls_cp437 iwlmvm kvm vfat irqbypass crct10dif_pclmul fat i2c_algo_bit snd_hda_codec mac80211 crc32_pclmul ghash_clmulni_i>
nov. 23 15:21:44 par-pf1pnv7r kernel:  x_tables btrfs blake2b_generic libcrc32c crc32c_generic xor raid6_pq serio_raw atkbd libps2 xhci_pci xhci_hcd i8042 serio crc32c_intel
nov. 23 15:21:44 par-pf1pnv7r kernel: CR2: 0000000000000020
nov. 23 15:21:44 par-pf1pnv7r kernel: ---[ end trace 7a793e6d7f38e776 ]---
nov. 23 15:21:44 par-pf1pnv7r kernel: RIP: 0010:_nv027527rm+0x9/0x90 [nvidia]
nov. 23 15:21:44 par-pf1pnv7r kernel: Code: 90 ff e8 ea b0 00 00 31 c0 48 83 c4 08 c3 31 c0 eb bf 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 83 ec 08 48 85 ff 74 57 <48> 8b 17 31 c0 48 85 d2 75 0e eb 2b 0f 1f 00 48 8b 52 10 48 85 d2
nov. 23 15:21:44 par-pf1pnv7r kernel: RSP: 0018:ffffb46ec0953c00 EFLAGS: 00010202
nov. 23 15:21:44 par-pf1pnv7r kernel: RAX: 0000000000000020 RBX: 0000000000000020 RCX: 0000000000000010
nov. 23 15:21:44 par-pf1pnv7r kernel: RDX: ffff967adf2abbc8 RSI: ffffffffffffffff RDI: 0000000000000020
nov. 23 15:21:44 par-pf1pnv7r kernel: RBP: ffff967e1d2ca9d0 R08: ffffffffc3b5f650 R09: ffff967e1d2ca9b0
nov. 23 15:21:44 par-pf1pnv7r kernel: R10: ffffffffc27aa820 R11: ffff967e22864808 R12: 0000000000000020
nov. 23 15:21:44 par-pf1pnv7r kernel: R13: 0000000000000000 R14: ffff967e1d2cab38 R15: ffff967e1d2cac78
nov. 23 15:21:44 par-pf1pnv7r kernel: FS:  0000000000000000(0000) GS:ffff967e52680000(0000) knlGS:0000000000000000
nov. 23 15:21:44 par-pf1pnv7r kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
nov. 23 15:21:44 par-pf1pnv7r kernel: CR2: 0000000000000020 CR3: 000000037a0ee001 CR4: 00000000003706e0
nov. 23 15:21:44 par-pf1pnv7r kernel: BUG: kernel NULL pointer dereference, address: 0000000000000930
nov. 23 15:21:44 par-pf1pnv7r kernel: #PF: supervisor write access in kernel mode
nov. 23 15:21:44 par-pf1pnv7r kernel: #PF: error_code(0x0002) - not-present page
nov. 23 15:21:44 par-pf1pnv7r kernel: PGD 800000037d514067 P4D 800000037d514067 PUD 0
nov. 23 15:21:44 par-pf1pnv7r kernel: Oops: 0002 [#2] PREEMPT SMP PTI
nov. 23 15:21:44 par-pf1pnv7r kernel: CPU: 2 PID: 1084 Comm: irq/169-nvidia Tainted: P      D    OE     5.9.10-1-MANJARO #1
nov. 23 15:21:44 par-pf1pnv7r kernel: Hardware name: LENOVO 20KHS21N00/20KHS21N00, BIOS N23ET65W (1.40 ) 07/02/2019
nov. 23 15:21:44 par-pf1pnv7r kernel: RIP: 0010:mutex_lock+0x10/0x20
nov. 23 15:21:44 par-pf1pnv7r kernel: Code: 03 31 c0 c3 eb d4 0f 1f 40 00 0f 1f 44 00 00 be 02 00 00 00 e9 61 fa ff ff 90 0f 1f 44 00 00 31 c0 65 48 8b 14 25 c0 7b 01 00 <f0> 48 0f b1 17 75 01 c3 eb d6 66 0f 1f 44 00 00 0f 1f 44 00 00 41
nov. 23 15:21:44 par-pf1pnv7r kernel: RSP: 0018:ffffb46ec0953e30 EFLAGS: 00010246
nov. 23 15:21:44 par-pf1pnv7r kernel: RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
nov. 23 15:21:44 par-pf1pnv7r kernel: RDX: ffff967e17bb5d00 RSI: 0000000000000000 RDI: 0000000000000930
nov. 23 15:21:44 par-pf1pnv7r kernel: RBP: 0000000000000930 R08: 000000000000001f R09: 0000000000000000
nov. 23 15:21:44 par-pf1pnv7r kernel: R10: ffff967e2ca7c400 R11: ffffb46ec0953801 R12: ffff967e17bb64cc
nov. 23 15:21:44 par-pf1pnv7r kernel: R13: 0000000000000000 R14: 0000000000000001 R15: ffff967e17bb5d00
nov. 23 15:21:44 par-pf1pnv7r kernel: FS:  0000000000000000(0000) GS:ffff967e52680000(0000) knlGS:0000000000000000
nov. 23 15:21:44 par-pf1pnv7r kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
nov. 23 15:21:44 par-pf1pnv7r kernel: CR2: 0000000000000930 CR3: 000000037a0ee001 CR4: 00000000003706e0
nov. 23 15:21:44 par-pf1pnv7r kernel: Call Trace:
nov. 23 15:21:44 par-pf1pnv7r kernel:  perf_event_exit_task+0x30/0x440
nov. 23 15:21:44 par-pf1pnv7r kernel:  ? put_cpu_partial+0x92/0x140
nov. 23 15:21:44 par-pf1pnv7r kernel:  ? kfree+0x40f/0x440
nov. 23 15:21:44 par-pf1pnv7r kernel:  do_exit+0x37f/0xaa0
nov. 23 15:21:44 par-pf1pnv7r kernel:  ? task_work_run+0x5c/0x90
nov. 23 15:21:44 par-pf1pnv7r kernel:  ? do_exit+0x36f/0xaa0
nov. 23 15:21:44 par-pf1pnv7r kernel:  ? kthread+0x142/0x160
nov. 23 15:21:44 par-pf1pnv7r kernel:  ? rewind_stack_do_exit+0x17/0x17
nov. 23 15:21:44 par-pf1pnv7r kernel: Modules linked in: snd_seq_dummy snd_hrtimer snd_seq bnep xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo xt_addrtype iptable_filter iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 br_netfilt>
nov. 23 15:21:44 par-pf1pnv7r kernel:  intel_wmi_thunderbolt ac97_bus wmi_bmof snd_pcm_dmaengine kvm_intel snd_hda_intel nls_iso8859_1 snd_intel_dspcfg nls_cp437 iwlmvm kvm vfat irqbypass crct10dif_pclmul fat i2c_algo_bit snd_hda_codec mac80211 crc32_pclmul ghash_clmulni_i>
nov. 23 15:21:44 par-pf1pnv7r kernel:  x_tables btrfs blake2b_generic libcrc32c crc32c_generic xor raid6_pq serio_raw atkbd libps2 xhci_pci xhci_hcd i8042 serio crc32c_intel
nov. 23 15:21:44 par-pf1pnv7r kernel: CR2: 0000000000000930
nov. 23 15:21:44 par-pf1pnv7r kernel: ---[ end trace 7a793e6d7f38e777 ]---
nov. 23 15:21:44 par-pf1pnv7r kernel: RIP: 0010:_nv027527rm+0x9/0x90 [nvidia]
nov. 23 15:21:44 par-pf1pnv7r kernel: Code: 90 ff e8 ea b0 00 00 31 c0 48 83 c4 08 c3 31 c0 eb bf 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 83 ec 08 48 85 ff 74 57 <48> 8b 17 31 c0 48 85 d2 75 0e eb 2b 0f 1f 00 48 8b 52 10 48 85 d2
nov. 23 15:21:44 par-pf1pnv7r kernel: RSP: 0018:ffffb46ec0953c00 EFLAGS: 00010202
nov. 23 15:21:44 par-pf1pnv7r kernel: RAX: 0000000000000020 RBX: 0000000000000020 RCX: 0000000000000010
nov. 23 15:21:44 par-pf1pnv7r kernel: RDX: ffff967adf2abbc8 RSI: ffffffffffffffff RDI: 0000000000000020
nov. 23 15:21:44 par-pf1pnv7r kernel: RBP: ffff967e1d2ca9d0 R08: ffffffffc3b5f650 R09: ffff967e1d2ca9b0
nov. 23 15:21:44 par-pf1pnv7r kernel: R10: ffffffffc27aa820 R11: ffff967e22864808 R12: 0000000000000020
nov. 23 15:21:44 par-pf1pnv7r kernel: R13: 0000000000000000 R14: ffff967e1d2cab38 R15: ffff967e1d2cac78
nov. 23 15:21:44 par-pf1pnv7r kernel: FS:  0000000000000000(0000) GS:ffff967e52680000(0000) knlGS:0000000000000000
nov. 23 15:21:44 par-pf1pnv7r kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
nov. 23 15:21:44 par-pf1pnv7r kernel: CR2: 0000000000000930 CR3: 000000037a0ee001 CR4: 00000000003706e0
nov. 23 15:21:44 par-pf1pnv7r kernel: Fixing recursive fault but reboot is needed!
-- Reboot --

This happens up to 3 times a day, it is very annoying. Using vdpau.

❯ uname -a                                      
Linux par-pf1pnv7r 5.9.10-1-MANJARO #1 SMP PREEMPT Sun Nov 22 11:25:19 UTC 2020 x86_64 GNU/Linux

❯ nvidia-smi
Mon Nov 23 19:48:03 2020       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 455.45.01    Driver Version: 455.45.01    CUDA Version: 11.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  GeForce RTX 207...  Off  | 00000000:0A:00.0  On |                  N/A |
|  0%   36C    P8    25W / 215W |   1651MiB /  7982MiB |     11%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

Is it possible to have a fix quickly … ? Thx!

From what I’ve been observing there doesn’t seem to be any real way to deterministically reproduce this. The fastest I’ve had it happen was just at the start of a KDE 5 Plasma session and opening Chromium with Google as the start page, 5 mins into using the computer at most. Followed by days of uptime.

Hi VaporD,
Can you please share nvidia bug report so that I will try to use similar system and display as well.
I am using same Arch linux kernel version ; driver and nvidia card and below is the KDE related information. Also please let me know if you have any concrete repro steps to duplicate issue.

KDE Plasma Version - 5.20.0
KDE Framework Version - 5.75.0
Qt Version - 5.15.1

Hi amrits,

Unfortunately, I completely messed up my system shortly after the last log I sent on Oct 23 and after reinstalling Arch I have yet to encounter the error again in a month. I don’t know if it is just luck or if reinstalling “fixed” something, but I was getting the error every few days before then. My new system is 5.9.9-arch1-1 with nvidia 455.45.01-1, and the same 1080ti. I was and am still using Gnome rather than KDE. I was never able to reproduce the error - it seemed to happen randomly. Let me know if there is any other information that might help.

Thanks

I’ve ran into it again on my Arch Linux system. As previous answers, this issue is intermittent, but this time everything froze soon after I clicked on the outlook join a meeting button in the chromium browser. This time it was on a single monitor setup, but previously it happened on a dual monitor setup as well

OS: Arch Linux
kernel: Linux 5.9.10-arch1-1
DE: gnome
nvidia driver: 455.45.01

# uname -a
Linux unglued-pc 5.9.10-arch1-1 #1 SMP PREEMPT Sun, 22 Nov 2020 14:16:59 +0000 x86_64 GNU/Linux
# last few lines from the journal with the kernel panic stack trace
journalctl -b -1 --lines 114

journal_logs.txt (12.2 KB)

I had this happen twice. Both were after long usage (like 4-5 hours of software development and using chrome as browser)

❯ uname -a
Linux kubi-ms7678 5.9.10-1-MANJARO #1 SMP PREEMPT Sun Nov 22 11:25:19 UTC 2020 x86_64 GNU/Linux
❯ nvidia-smi
Thu Nov 26 16:11:56 2020       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 455.45.01    Driver Version: 455.45.01    CUDA Version: 11.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  GeForce GTX 106...  Off  | 00000000:01:00.0  On |                  N/A |
|  0%   52C    P8    13W / 120W |    350MiB /  6075MiB |      4%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
Nov 26 16:04:12 kubi-ms7678 kernel: BUG: kernel NULL pointer dereference, address: 0000000000000020
Nov 26 16:04:12 kubi-ms7678 kernel: #PF: supervisor read access in kernel mode
Nov 26 16:04:12 kubi-ms7678 kernel: #PF: error_code(0x0000) - not-present page
Nov 26 16:04:12 kubi-ms7678 kernel: PGD 800000053306b067 P4D 800000053306b067 PUD 0 
Nov 26 16:04:12 kubi-ms7678 kernel: Oops: 0000 [#1] PREEMPT SMP PTI
Nov 26 16:04:12 kubi-ms7678 kernel: CPU: 3 PID: 707 Comm: irq/37-nvidia Tainted: P           OE     5.9.10-1-MANJARO #1
Nov 26 16:04:12 kubi-ms7678 kernel: Hardware name: MSI MS-7678/H67MA-E45 (B3) (MS-7678), BIOS V3.2 01/17/2013
Nov 26 16:04:12 kubi-ms7678 kernel: RIP: 0010:_nv027527rm+0x9/0x90 [nvidia]
Nov 26 16:04:12 kubi-ms7678 kernel: Code: 90 ff e8 ea b0 00 00 31 c0 48 83 c4 08 c3 31 c0 eb bf 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 83 ec 08 48 85 ff 74 57 <48> 8b 17 31 c0 48 85 d2 75 0e eb 2b 0f 1f 00 48 8b 52 10 48 85 d2
Nov 26 16:04:12 kubi-ms7678 kernel: RSP: 0018:ffffb99941387be0 EFLAGS: 00010202
Nov 26 16:04:12 kubi-ms7678 kernel: RAX: 0000000000000020 RBX: 0000000000000020 RCX: 0000000000000010
Nov 26 16:04:12 kubi-ms7678 kernel: RDX: ffff95892f4a1a08 RSI: ffffffffffffffff RDI: 0000000000000020
Nov 26 16:04:12 kubi-ms7678 kernel: RBP: ffff958922fdd940 R08: ffffffffc1d79650 R09: ffff958922fdd920
Nov 26 16:04:12 kubi-ms7678 kernel: R10: ffffffffc09c4820 R11: ffff95893e743808 R12: 0000000000000020
Nov 26 16:04:12 kubi-ms7678 kernel: R13: 0000000000000000 R14: ffff958922fddaa8 R15: ffff958922fddbb0
Nov 26 16:04:12 kubi-ms7678 kernel: FS:  0000000000000000(0000) GS:ffff958946d80000(0000) knlGS:0000000000000000
Nov 26 16:04:12 kubi-ms7678 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Nov 26 16:04:12 kubi-ms7678 kernel: CR2: 0000000000000020 CR3: 000000053cbc2005 CR4: 00000000000606e0
Nov 26 16:04:12 kubi-ms7678 kernel: Call Trace:
Nov 26 16:04:12 kubi-ms7678 kernel:  ? _nv029950rm+0x1b/0x90 [nvidia]
Nov 26 16:04:12 kubi-ms7678 kernel:  ? _nv025474rm+0x18/0x60 [nvidia]
Nov 26 16:04:12 kubi-ms7678 kernel:  ? _nv011691rm+0x13d/0x1c0 [nvidia]
Nov 26 16:04:12 kubi-ms7678 kernel:  ? _nv000083rm+0x12f/0x1a0 [nvidia]
Nov 26 16:04:12 kubi-ms7678 kernel:  ? _nv011619rm+0xff/0x180 [nvidia]
Nov 26 16:04:12 kubi-ms7678 kernel:  ? _nv018449rm+0x1af/0x210 [nvidia]
Nov 26 16:04:12 kubi-ms7678 kernel:  ? _nv018389rm+0xd9a/0xe90 [nvidia]
Nov 26 16:04:12 kubi-ms7678 kernel:  ? _nv018390rm+0xde/0x260 [nvidia]
Nov 26 16:04:12 kubi-ms7678 kernel:  ? _nv018356rm+0x72/0xc0 [nvidia]
Nov 26 16:04:12 kubi-ms7678 kernel:  ? _nv018370rm+0x235/0x2d0 [nvidia]
Nov 26 16:04:12 kubi-ms7678 kernel:  ? _nv026076rm+0x10/0x10 [nvidia]
Nov 26 16:04:12 kubi-ms7678 kernel:  ? _nv018403rm+0xac/0xe0 [nvidia]
Nov 26 16:04:12 kubi-ms7678 kernel:  ? _nv027734rm+0x820/0xdc0 [nvidia]
Nov 26 16:04:12 kubi-ms7678 kernel:  ? _nv007566rm+0x155/0x270 [nvidia]
Nov 26 16:04:12 kubi-ms7678 kernel:  ? _nv027742rm+0x8d/0x180 [nvidia]
Nov 26 16:04:12 kubi-ms7678 kernel:  ? _nv000712rm+0xa9/0x200 [nvidia]
Nov 26 16:04:12 kubi-ms7678 kernel:  ? disable_irq_nosync+0x10/0x10
Nov 26 16:04:12 kubi-ms7678 kernel:  ? rm_isr_bh+0x1c/0x60 [nvidia]
Nov 26 16:04:12 kubi-ms7678 kernel:  ? nvidia_isr_kthread_bh+0x1b/0x40 [nvidia]
Nov 26 16:04:12 kubi-ms7678 kernel:  ? irq_thread_fn+0x20/0x60
Nov 26 16:04:12 kubi-ms7678 kernel:  ? irq_thread+0xf5/0x1a0
Nov 26 16:04:12 kubi-ms7678 kernel:  ? irq_finalize_oneshot.part.0+0xe0/0xe0
Nov 26 16:04:12 kubi-ms7678 kernel:  ? irq_thread_check_affinity+0xd0/0xd0
Nov 26 16:04:12 kubi-ms7678 kernel:  ? kthread+0x142/0x160
Nov 26 16:04:12 kubi-ms7678 kernel:  ? __kthread_bind_mask+0x60/0x60
Nov 26 16:04:12 kubi-ms7678 kernel:  ? ret_from_fork+0x22/0x30
Nov 26 16:04:12 kubi-ms7678 kernel: Modules linked in: snd_seq_dummy snd_hrtimer snd_seq xt_nat xt_tcpudp veth xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo xt_addrtype iptable_filter iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defr>
Nov 26 16:04:12 kubi-ms7678 kernel:  nvidia_modeset(POE) drm_kms_helper cec rc_core drm agpgart syscopyarea sysfillrect sysimgblt fb_sys_fops nvidia(POE) msr sg crypto_user ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd2 sr_mod cdrom hid_generic ata_generic us>
Nov 26 16:04:12 kubi-ms7678 kernel: CR2: 0000000000000020
Nov 26 16:04:12 kubi-ms7678 kernel: ---[ end trace 6a8eb27e3c73e626 ]---
Nov 26 16:04:12 kubi-ms7678 kernel: RIP: 0010:_nv027527rm+0x9/0x90 [nvidia]
Nov 26 16:04:12 kubi-ms7678 kernel: Code: 90 ff e8 ea b0 00 00 31 c0 48 83 c4 08 c3 31 c0 eb bf 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 83 ec 08 48 85 ff 74 57 <48> 8b 17 31 c0 48 85 d2 75 0e eb 2b 0f 1f 00 48 8b 52 10 48 85 d2
Nov 26 16:04:12 kubi-ms7678 kernel: RSP: 0018:ffffb99941387be0 EFLAGS: 00010202
Nov 26 16:04:12 kubi-ms7678 kernel: RAX: 0000000000000020 RBX: 0000000000000020 RCX: 0000000000000010
Nov 26 16:04:12 kubi-ms7678 kernel: RDX: ffff95892f4a1a08 RSI: ffffffffffffffff RDI: 0000000000000020
Nov 26 16:04:12 kubi-ms7678 kernel: RBP: ffff958922fdd940 R08: ffffffffc1d79650 R09: ffff958922fdd920
Nov 26 16:04:12 kubi-ms7678 kernel: R10: ffffffffc09c4820 R11: ffff95893e743808 R12: 0000000000000020
Nov 26 16:04:12 kubi-ms7678 kernel: R13: 0000000000000000 R14: ffff958922fddaa8 R15: ffff958922fddbb0
Nov 26 16:04:12 kubi-ms7678 kernel: FS:  0000000000000000(0000) GS:ffff958946d80000(0000) knlGS:0000000000000000
Nov 26 16:04:12 kubi-ms7678 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Nov 26 16:04:12 kubi-ms7678 kernel: CR2: 0000000000000020 CR3: 000000053cbc2005 CR4: 00000000000606e0
Nov 26 16:04:12 kubi-ms7678 kernel: BUG: kernel NULL pointer dereference, address: 0000000000000930
Nov 26 16:04:12 kubi-ms7678 kernel: #PF: supervisor write access in kernel mode
Nov 26 16:04:12 kubi-ms7678 kernel: #PF: error_code(0x0002) - not-present page
Nov 26 16:04:12 kubi-ms7678 kernel: PGD 800000053306b067 P4D 800000053306b067 PUD 0 
Nov 26 16:04:12 kubi-ms7678 kernel: Oops: 0002 [#2] PREEMPT SMP PTI
Nov 26 16:04:12 kubi-ms7678 kernel: CPU: 3 PID: 707 Comm: irq/37-nvidia Tainted: P      D    OE     5.9.10-1-MANJARO #1
Nov 26 16:04:12 kubi-ms7678 kernel: Hardware name: MSI MS-7678/H67MA-E45 (B3) (MS-7678), BIOS V3.2 01/17/2013
Nov 26 16:04:12 kubi-ms7678 kernel: RIP: 0010:mutex_lock+0x10/0x20
Nov 26 16:04:12 kubi-ms7678 kernel: Code: 03 31 c0 c3 eb d4 0f 1f 40 00 66 66 66 66 90 be 02 00 00 00 e9 61 fa ff ff 90 66 66 66 66 90 31 c0 65 48 8b 14 25 c0 7b 01 00 <f0> 48 0f b1 17 75 01 c3 eb d6 66 0f 1f 44 00 00 66 66 66 66 90 41
Nov 26 16:04:12 kubi-ms7678 kernel: RSP: 0018:ffffb99941387e30 EFLAGS: 00010246
Nov 26 16:04:12 kubi-ms7678 kernel: RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
Nov 26 16:04:12 kubi-ms7678 kernel: RDX: ffff958936858000 RSI: 0000000000000000 RDI: 0000000000000930
Nov 26 16:04:12 kubi-ms7678 kernel: RBP: 0000000000000930 R08: 000000000000000f R09: 0000000000000000
Nov 26 16:04:12 kubi-ms7678 kernel: R10: ffff95892f6b9800 R11: ffffb99941387801 R12: ffff9589368587cc
Nov 26 16:04:12 kubi-ms7678 kernel: R13: 0000000000000000 R14: 0000000000000001 R15: ffff958936858000
Nov 26 16:04:12 kubi-ms7678 kernel: FS:  0000000000000000(0000) GS:ffff958946d80000(0000) knlGS:0000000000000000
Nov 26 16:04:12 kubi-ms7678 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Nov 26 16:04:12 kubi-ms7678 kernel: CR2: 0000000000000930 CR3: 000000053cbc2005 CR4: 00000000000606e0
Nov 26 16:04:12 kubi-ms7678 kernel: Call Trace:
Nov 26 16:04:12 kubi-ms7678 kernel:  perf_event_exit_task+0x30/0x440
Nov 26 16:04:12 kubi-ms7678 kernel:  ? put_cpu_partial+0x92/0x140
Nov 26 16:04:12 kubi-ms7678 kernel:  ? kfree+0x40f/0x440
Nov 26 16:04:12 kubi-ms7678 kernel:  do_exit+0x37f/0xaa0
Nov 26 16:04:12 kubi-ms7678 kernel:  ? task_work_run+0x5c/0x90
Nov 26 16:04:12 kubi-ms7678 kernel:  ? do_exit+0x36f/0xaa0
Nov 26 16:04:12 kubi-ms7678 kernel:  ? kthread+0x142/0x160
Nov 26 16:04:12 kubi-ms7678 kernel:  ? rewind_stack_do_exit+0x17/0x17
Nov 26 16:04:12 kubi-ms7678 kernel: Modules linked in: snd_seq_dummy snd_hrtimer snd_seq xt_nat xt_tcpudp veth xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo xt_addrtype iptable_filter iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defr>
Nov 26 16:04:12 kubi-ms7678 kernel:  nvidia_modeset(POE) drm_kms_helper cec rc_core drm agpgart syscopyarea sysfillrect sysimgblt fb_sys_fops nvidia(POE) msr sg crypto_user ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd2 sr_mod cdrom hid_generic ata_generic us>
Nov 26 16:04:12 kubi-ms7678 kernel: CR2: 0000000000000930
Nov 26 16:04:12 kubi-ms7678 kernel: ---[ end trace 6a8eb27e3c73e627 ]---
Nov 26 16:04:12 kubi-ms7678 kernel: RIP: 0010:_nv027527rm+0x9/0x90 [nvidia]
Nov 26 16:04:12 kubi-ms7678 kernel: Code: 90 ff e8 ea b0 00 00 31 c0 48 83 c4 08 c3 31 c0 eb bf 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 83 ec 08 48 85 ff 74 57 <48> 8b 17 31 c0 48 85 d2 75 0e eb 2b 0f 1f 00 48 8b 52 10 48 85 d2
Nov 26 16:04:12 kubi-ms7678 kernel: RSP: 0018:ffffb99941387be0 EFLAGS: 00010202
Nov 26 16:04:12 kubi-ms7678 kernel: RAX: 0000000000000020 RBX: 0000000000000020 RCX: 0000000000000010
Nov 26 16:04:12 kubi-ms7678 kernel: RDX: ffff95892f4a1a08 RSI: ffffffffffffffff RDI: 0000000000000020
Nov 26 16:04:12 kubi-ms7678 kernel: RBP: ffff958922fdd940 R08: ffffffffc1d79650 R09: ffff958922fdd920
Nov 26 16:04:12 kubi-ms7678 kernel: R10: ffffffffc09c4820 R11: ffff95893e743808 R12: 0000000000000020
Nov 26 16:04:12 kubi-ms7678 kernel: R13: 0000000000000000 R14: ffff958922fddaa8 R15: ffff958922fddbb0
Nov 26 16:04:12 kubi-ms7678 kernel: FS:  0000000000000000(0000) GS:ffff958946d80000(0000) knlGS:0000000000000000
Nov 26 16:04:12 kubi-ms7678 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Nov 26 16:04:12 kubi-ms7678 kernel: CR2: 0000000000000930 CR3: 000000053cbc2005 CR4: 00000000000606e0
Nov 26 16:04:12 kubi-ms7678 kernel: Fixing recursive fault but reboot is needed!
lines 6298-6363/6363 (END)

I should say that I’ve been running 455.46.01 and 455.46.02 drivers with the patch from 455.23.04: Page allocation failure in kernel module at random points - #55 by aplattner for a couple weeks and have not seen this bug occur. I cannot be sure, but it is possible that this bug is related to the one in that other thread, and may be fixed indirectly by that patch. For example, where nvkms_alloc would previously return NULL and crash on dereference, it no longer returns NULL and doesn’t crash.

If anyone has relatively frequent occurrences of this bug, please try the patch from that other thread and report if it helps or not.

PS: If it doesn’t crash then it doesn’t mean the NULL pointer bug is fixed. There is still somewhere a NULL pointer check missing.

Arch Linux Version: 5.9.10-arch1-1
nvidia Driver Version: 455.45.01-1
nvidia Card: GeForce RTX 2070 Super

Nov 26 11:29:14 Cheza kernel: BUG: kernel NULL pointer dereference, address: 0000000000000020
Nov 26 11:29:14 Cheza kernel: #PF: supervisor read access in kernel mode
Nov 26 11:29:14 Cheza kernel: #PF: error_code(0x0000) - not-present page
Nov 26 11:29:14 Cheza kernel: PGD fd8a9f067 P4D fd8a9f067 PUD 0 
Nov 26 11:29:14 Cheza kernel: Oops: 0000 [#1] PREEMPT SMP NOPTI
Nov 26 11:29:14 Cheza kernel: CPU: 0 PID: 803 Comm: irq/147-nvidia Tainted: P           OE     5.9.10-arch1-1 #1
Nov 26 11:29:14 Cheza kernel: Hardware name: Puget Systems MS-7B50/MPG Z390M GAMING EDGE AC (MS-7B50), BIOS 1.70 12/30/2019
Nov 26 11:29:14 Cheza kernel: RIP: 0010:_nv027527rm+0x9/0x90 [nvidia]
Nov 26 11:29:14 Cheza kernel: Code: 90 ff e8 ea b0 00 00 31 c0 48 83 c4 08 c3 31 c0 eb bf 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 83 ec 08 48 85 ff 74 57 <48> 8b 17 31 c0 48 85 d2 75 0e eb 2b 0f 1f 00 48 8b 52 10 48 85 d2
Nov 26 11:29:14 Cheza kernel: RSP: 0018:ffff9d7ec21bfb50 EFLAGS: 00010202
Nov 26 11:29:14 Cheza kernel: RAX: 0000000000000020 RBX: 0000000000000020 RCX: 0000000000000010
Nov 26 11:29:14 Cheza kernel: RDX: ffff8de90247c4c8 RSI: ffffffffffffffff RDI: 0000000000000020
Nov 26 11:29:14 Cheza kernel: RBP: ffff8de95856d8a0 R08: ffffffffc32b5530 R09: ffff8de95856d880
Nov 26 11:29:14 Cheza kernel: R10: ffffffffc1f00820 R11: ffff8de9718d3808 R12: 0000000000000020
Nov 26 11:29:14 Cheza kernel: R13: 0000000000000000 R14: ffff8de95856da08 R15: ffff8de95856db10
Nov 26 11:29:14 Cheza kernel: FS:  0000000000000000(0000) GS:ffff8de97dc00000(0000) knlGS:0000000000000000
Nov 26 11:29:14 Cheza kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Nov 26 11:29:14 Cheza kernel: CR2: 0000000000000020 CR3: 0000000fbb1b8006 CR4: 00000000003706f0
Nov 26 11:29:14 Cheza kernel: Call Trace:
Nov 26 11:29:14 Cheza kernel:  ? _nv029950rm+0x1b/0x90 [nvidia]
Nov 26 11:29:14 Cheza kernel:  ? _nv025474rm+0x18/0x60 [nvidia]
Nov 26 11:29:14 Cheza kernel:  ? _nv011691rm+0x13d/0x1c0 [nvidia]
Nov 26 11:29:14 Cheza kernel:  ? _nv000083rm+0x12f/0x1a0 [nvidia]
Nov 26 11:29:14 Cheza kernel:  ? _nv011619rm+0xff/0x180 [nvidia]
Nov 26 11:29:14 Cheza kernel:  ? _nv018449rm+0x1af/0x210 [nvidia]
Nov 26 11:29:14 Cheza kernel:  ? _nv018389rm+0xd9a/0xe90 [nvidia]
Nov 26 11:29:14 Cheza kernel:  ? _nv018390rm+0xde/0x260 [nvidia]
Nov 26 11:29:14 Cheza kernel:  ? _nv018391rm+0x125/0x330 [nvidia]
Nov 26 11:29:14 Cheza kernel:  ? _nv018392rm+0x1f7/0x320 [nvidia]
Nov 26 11:29:14 Cheza kernel:  ? _nv018356rm+0x72/0xc0 [nvidia]
Nov 26 11:29:14 Cheza kernel:  ? _nv018370rm+0x235/0x2d0 [nvidia]
Nov 26 11:29:14 Cheza kernel:  ? _nv026077rm+0x290/0x290 [nvidia]
Nov 26 11:29:14 Cheza kernel:  ? _nv018372rm+0x39/0x4b0 [nvidia]
Nov 26 11:29:14 Cheza kernel:  ? _nv026077rm+0x290/0x290 [nvidia]
Nov 26 11:29:14 Cheza kernel:  ? _nv018403rm+0xac/0xe0 [nvidia]
Nov 26 11:29:14 Cheza kernel:  ? _nv027734rm+0x820/0xdc0 [nvidia]
Nov 26 11:29:14 Cheza kernel:  ? _nv007566rm+0x155/0x270 [nvidia]
Nov 26 11:29:14 Cheza kernel:  ? _nv027742rm+0x8d/0x180 [nvidia]
Nov 26 11:29:14 Cheza kernel:  ? _nv000712rm+0xa9/0x200 [nvidia]
Nov 26 11:29:14 Cheza kernel:  ? disable_irq_nosync+0x10/0x10
Nov 26 11:29:14 Cheza kernel:  ? rm_isr_bh+0x1c/0x60 [nvidia]
Nov 26 11:29:14 Cheza kernel:  ? nvidia_isr_kthread_bh+0x1b/0x40 [nvidia]
Nov 26 11:29:14 Cheza kernel:  ? irq_thread_fn+0x20/0x60
Nov 26 11:29:14 Cheza kernel:  ? irq_thread+0xf5/0x1a0
Nov 26 11:29:14 Cheza kernel:  ? irq_finalize_oneshot.part.0+0xe0/0xe0
Nov 26 11:29:14 Cheza kernel:  ? irq_thread_check_affinity+0xd0/0xd0
Nov 26 11:29:14 Cheza kernel:  ? kthread+0x142/0x160
Nov 26 11:29:14 Cheza kernel:  ? __kthread_bind_mask+0x60/0x60
Nov 26 11:29:14 Cheza kernel:  ? ret_from_fork+0x1f/0x30
Nov 26 11:29:14 Cheza kernel: Modules linked in: ses enclosure scsi_transport_sas dm_crypt cbc encrypted_keys trusted tpm rng_core dm_mod snd_hda_codec_realtek snd_hda_codec_generic fuse rpcsec_gss_krb5 auth_rpcgss md4 cmac nfsv4 nfs nls_utf8 cifs lockd grace sunrpc dns_resolver nfs_ssc fscache libdes hid_logitech_hidpp hid_logitech_dj mo>
Nov 26 11:29:14 Cheza kernel:  nls_cp437 snd_soc_skl aesni_intel crypto_simd snd_soc_sst_ipc cryptd snd_soc_sst_dsp glue_helper snd_hda_ext_core rapl snd_soc_acpi_intel_match intel_cstate vfat snd_soc_acpi fat intel_uncore libarc4 i915 pcspkr snd_soc_core iwlwifi ofpart snd_hda_codec_hdmi cmdlinepart snd_compress intel_spi_pci ac97_bus in>
Nov 26 11:29:14 Cheza kernel: CR2: 0000000000000020
Nov 26 11:29:14 Cheza kernel: ---[ end trace baf3748d17a3ee4e ]---
Nov 26 11:29:14 Cheza kernel: RIP: 0010:_nv027527rm+0x9/0x90 [nvidia]
Nov 26 11:29:14 Cheza kernel: Code: 90 ff e8 ea b0 00 00 31 c0 48 83 c4 08 c3 31 c0 eb bf 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 83 ec 08 48 85 ff 74 57 <48> 8b 17 31 c0 48 85 d2 75 0e eb 2b 0f 1f 00 48 8b 52 10 48 85 d2
Nov 26 11:29:14 Cheza kernel: RSP: 0018:ffff9d7ec21bfb50 EFLAGS: 00010202
Nov 26 11:29:14 Cheza kernel: RAX: 0000000000000020 RBX: 0000000000000020 RCX: 0000000000000010
Nov 26 11:29:14 Cheza kernel: RDX: ffff8de90247c4c8 RSI: ffffffffffffffff RDI: 0000000000000020
Nov 26 11:29:14 Cheza kernel: RBP: ffff8de95856d8a0 R08: ffffffffc32b5530 R09: ffff8de95856d880
Nov 26 11:29:14 Cheza kernel: R10: ffffffffc1f00820 R11: ffff8de9718d3808 R12: 0000000000000020
Nov 26 11:29:14 Cheza kernel: R13: 0000000000000000 R14: ffff8de95856da08 R15: ffff8de95856db10
Nov 26 11:29:14 Cheza kernel: FS:  0000000000000000(0000) GS:ffff8de97dc00000(0000) knlGS:0000000000000000
Nov 26 11:29:14 Cheza kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Nov 26 11:29:14 Cheza kernel: CR2: 0000000000000020 CR3: 0000000fbb1b8006 CR4: 00000000003706f0
Nov 26 11:29:14 Cheza kernel: BUG: kernel NULL pointer dereference, address: 0000000000000930
Nov 26 11:29:14 Cheza kernel: #PF: supervisor write access in kernel mode
Nov 26 11:29:14 Cheza kernel: #PF: error_code(0x0002) - not-present page
Nov 26 11:29:14 Cheza kernel: PGD fd8a9f067 P4D fd8a9f067 PUD 0 
Nov 26 11:29:14 Cheza kernel: Oops: 0002 [#2] PREEMPT SMP NOPTI
Nov 26 11:29:14 Cheza kernel: CPU: 0 PID: 803 Comm: irq/147-nvidia Tainted: P      D    OE     5.9.10-arch1-1 #1
Nov 26 11:29:14 Cheza kernel: Hardware name: Puget Systems MS-7B50/MPG Z390M GAMING EDGE AC (MS-7B50), BIOS 1.70 12/30/2019
Nov 26 11:29:14 Cheza kernel: RIP: 0010:mutex_lock+0x10/0x20
Nov 26 11:29:14 Cheza kernel: Code: 03 31 c0 c3 eb d4 0f 1f 40 00 0f 1f 44 00 00 be 02 00 00 00 e9 61 fa ff ff 90 0f 1f 44 00 00 31 c0 65 48 8b 14 25 c0 7b 01 00 <f0> 48 0f b1 17 75 01 c3 eb d6 66 0f 1f 44 00 00 0f 1f 44 00 00 41
Nov 26 11:29:14 Cheza kernel: RSP: 0018:ffff9d7ec21bfe30 EFLAGS: 00010246
Nov 26 11:29:14 Cheza kernel: RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
Nov 26 11:29:14 Cheza kernel: RDX: ffff8de95d1bbe00 RSI: 0000000000000000 RDI: 0000000000000930
Nov 26 11:29:14 Cheza kernel: RBP: 0000000000000930 R08: 0000000000000001 R09: 0000000000000000
Nov 26 11:29:14 Cheza kernel: R10: ffff8de956541c00 R11: ffff9d7ec21bf800 R12: ffff8de95d1bc5cc
Nov 26 11:29:14 Cheza kernel: R13: 0000000000000000 R14: 0000000000000001 R15: ffff8de95d1bbe00
Nov 26 11:29:14 Cheza kernel: FS:  0000000000000000(0000) GS:ffff8de97dc00000(0000) knlGS:0000000000000000
Nov 26 11:29:14 Cheza kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Nov 26 11:29:14 Cheza kernel: CR2: 0000000000000930 CR3: 0000000fbb1b8006 CR4: 00000000003706f0
Nov 26 11:29:14 Cheza kernel: Call Trace:
Nov 26 11:29:14 Cheza kernel:  perf_event_exit_task+0x30/0x440
Nov 26 11:29:14 Cheza kernel:  ? kfree+0x40f/0x440
Nov 26 11:29:14 Cheza kernel:  do_exit+0x37f/0xaa0
Nov 26 11:29:14 Cheza kernel:  ? task_work_run+0x5c/0x90
Nov 26 11:29:14 Cheza kernel:  ? do_exit+0x36f/0xaa0
Nov 26 11:29:14 Cheza kernel:  ? kthread+0x142/0x160
Nov 26 11:29:14 Cheza kernel:  ? rewind_stack_do_exit+0x17/0x17
Nov 26 11:29:14 Cheza kernel: Modules linked in: ses enclosure scsi_transport_sas dm_crypt cbc encrypted_keys trusted tpm rng_core dm_mod snd_hda_codec_realtek snd_hda_codec_generic fuse rpcsec_gss_krb5 auth_rpcgss md4 cmac nfsv4 nfs nls_utf8 cifs lockd grace sunrpc dns_resolver nfs_ssc fscache libdes hid_logitech_hidpp hid_logitech_dj mo>
Nov 26 11:29:14 Cheza kernel:  nls_cp437 snd_soc_skl aesni_intel crypto_simd snd_soc_sst_ipc cryptd snd_soc_sst_dsp glue_helper snd_hda_ext_core rapl snd_soc_acpi_intel_match intel_cstate vfat snd_soc_acpi fat intel_uncore libarc4 i915 pcspkr snd_soc_core iwlwifi ofpart snd_hda_codec_hdmi cmdlinepart snd_compress intel_spi_pci ac97_bus in>
Nov 26 11:29:14 Cheza kernel: CR2: 0000000000000930
Nov 26 11:29:14 Cheza kernel: ---[ end trace baf3748d17a3ee4f ]---
Nov 26 11:29:14 Cheza kernel: RIP: 0010:_nv027527rm+0x9/0x90 [nvidia]
Nov 26 11:29:14 Cheza kernel: Code: 90 ff e8 ea b0 00 00 31 c0 48 83 c4 08 c3 31 c0 eb bf 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 83 ec 08 48 85 ff 74 57 <48> 8b 17 31 c0 48 85 d2 75 0e eb 2b 0f 1f 00 48 8b 52 10 48 85 d2
Nov 26 11:29:14 Cheza kernel: RSP: 0018:ffff9d7ec21bfb50 EFLAGS: 00010202
Nov 26 11:29:14 Cheza kernel: RAX: 0000000000000020 RBX: 0000000000000020 RCX: 0000000000000010
Nov 26 11:29:14 Cheza kernel: RDX: ffff8de90247c4c8 RSI: ffffffffffffffff RDI: 0000000000000020
Nov 26 11:29:14 Cheza kernel: RBP: ffff8de95856d8a0 R08: ffffffffc32b5530 R09: ffff8de95856d880
Nov 26 11:29:14 Cheza kernel: R10: ffffffffc1f00820 R11: ffff8de9718d3808 R12: 0000000000000020
Nov 26 11:29:14 Cheza kernel: R13: 0000000000000000 R14: ffff8de95856da08 R15: ffff8de95856db10
Nov 26 11:29:14 Cheza kernel: FS:  0000000000000000(0000) GS:ffff8de97dc00000(0000) knlGS:0000000000000000
Nov 26 11:29:14 Cheza kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Nov 26 11:29:14 Cheza kernel: CR2: 0000000000000930 CR3: 0000000fbb1b8006 CR4: 00000000003706f0
Nov 26 11:29:14 Cheza kernel: Fixing recursive fault but reboot is needed!

I Thought this was fixed!!??

Was moving cursor up and left inside chromium when this happened.

Just had another crash on 5.9.10-zen1-1-zen and nvidia-dkms on 455.45.01-1.

I don’t mean to be rude but how is this still an issue? I knew nvidia treated linux users like 2nd class citizens but this is ridiculous. Multiple driver versions with this random hard crash. Absolutely disastrous and I’m just a desktop user! I cannot imagine someone running this in a data center with mission critical data, if you don’t care about the users at least care about the bottom line. Having a buggy driver for over 2 months is going to seriously mess up your data center customers and other people who rely on their work.

My GTX 1080 was my first card from nvidia (that I paid a whole 700GBP for on release mind you) and I can say for sure this will be the last. Let me remind everyone here that we are using the dev’s forum because we don’t even have a section in the normal forums. Nvidia has proven time and time again they could not give 2 $h1tz about linux users and we are nothing more than an afterthought. If Nvidia doesn’t want to maintain the drivers, cool. Let the community do it and have open sourced drivers like AMD, BUT NOOOOOOOO!!! We can’t let these dirty linux scrubs have nice things. The nouveau drivers don’t even have clock support and are stuck at 300mhz!

In the words of Torvalds:

ps:
I don’t want this to come off as I’m attacking the devs or even one person in particular as this is a difficult situation for everyone, in terms of drivers and the current global situation… But I am pissed at Nvidia, the corporation. I’m sure the devs that do work in this thread are actually at work and not just deliberately not working to fix this issue. In terms of how they handled this and how it’s still not fixed after 2+ months and the already appalling linux support, I am very pissed and will not be going nvidia ever again.

1 Like

Yes, this is incredibly frustrating, every nvidia driver update causes fear for me, you just don’t know what else to expect from it. I’m definetly sure that if windows users faced with such an issue, it would be fixed immediately. But does nvidia consider other users who have also paid for their hardware as unnecessary garbage?

1 Like

People, just downgrade to 450.66 until this is fixed. It works completely fine.

how about nvidia gives us working drivers for devices we have paid for and not release unstable drivers for 2 months in a row?

I really apologize for not being able to replicate issue locally until now but we are trying on multiple systems at our end.
Since there are no concrete and reliable repro steps, it is taking too long. Below are few configuration details where issue has been reported -

P8Z68-V PRO
GeForce RTX 2080 Ti
455.22.04

ASRock Z77 Extreme4
455.23.04
GeForce GTX 780
5.8.13-arch1-1

ASUS MAXIMUS VII HERO
455.23.04
GeForce GTX 970
5.8.12-arch1-1-vfio

Please help to provide any other configuration setups where issue is observed.

I believe, this is one of the situations when development must be done without replication of the problem.

It’s pretty clear that the problem depends on timing and/or some unknown rare conditions that nevertheless happen often enough to be noticed. Most likely race condition or error handling is involved, and those can be derived entirely from analysis of the sources. Stack traces show very consistent patterns of calls, so developers should be able to de-obfuscate function names and offsets and see the exact places where near-NULL dereferences happen. This is normally enough to be used as a starting point for analysis.

There is no need to delay crash analysis and development until the problem can be reproduced, it may take months or years for it to show up, however code analysis will be an inevitable step of development no matter what.

Then, when fix developed “blindly” based entirely on analysis will be released, large number of users that for whatever reason see this problem often, can check if the problem is gone.

Otherwise we will never get usable fixes for things like that.

4 Likes

Speaking of which, this is what I got:

ASUS M4A89TD PRO USB3 with AMD Phenom™ II X6 1090T CPU
Gentoo on amd64 (x86_64)
NVIDIA driver 455.28
GeForce GTX 1080 Ti
Linux kernel 5.4.72 (custom build from gentoo-sources package, userspace gcc options -march=amdfam10 -mno-3dnow -mno-3dnowa -O2 -pipe).

[70126.875975] NVRM: GPU at PCI:0000:07:00: GPU-72cde1c5-f051-4551-58fb-a9846aeba90c
[70126.875977] NVRM: GPU Board Serial Number: 
[70126.875981] NVRM: Xid (PCI:0000:07:00): 31, pid=9510, Ch 00000080, intr 10000000. MMU Fault: ENGINE GRAPHICS GPCCLIENT_T1_7 faulted @ 0x0_00000000. Fault is of type FAULT_PDE ACCESS_TYPE_READ
[143146.081957] TCP: request_sock_TCP: Possible SYN flooding on port 53. Sending cookies.  Check SNMP counters.
[212643.530715] ext2 filesystem being mounted at /boot supports timestamps until 2038 (0x7fffffff)
[213371.884232] ext2 filesystem being mounted at /boot supports timestamps until 2038 (0x7fffffff)
[214763.180917] NVRM: Xid (PCI:0000:07:00): 31, pid=7884, Ch 00000088, intr 10000000. MMU Fault: ENGINE GRAPHICS GPCCLIENT_T1_7 faulted @ 0x0_00000000. Fault is of type FAULT_PDE ACCESS_TYPE_READ
[235779.224692] ext2 filesystem being mounted at /boot supports timestamps until 2038 (0x7fffffff)
[236250.298880] BUG: kernel NULL pointer dereference, address: 0000000000000020
[236250.298886] #PF: supervisor read access in kernel mode
[236250.298887] #PF: error_code(0x0000) - not-present page
[236250.298888] PGD 7c24d3067 P4D 7c24d3067 PUD 0 
[236250.298892] Oops: 0000 [#1] PREEMPT SMP NOPTI
[236250.298894] CPU: 4 PID: 4768 Comm: irq/58-nvidia Tainted: P           O      5.4.72-gentoo-x86_64 #1
[236250.298895] Hardware name: System manufacturer System Product Name/M4A89TD PRO USB3, BIOS 3029    09/07/2012
[236250.299174] RIP: 0010:_nv027470rm+0x9/0x90 [nvidia]
[236250.299178] Code: 90 ff e8 ea b0 00 00 31 c0 48 83 c4 08 c3 31 c0 eb bf 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 83 ec 08 48 85 ff 74 57 <48> 8b 17 31 c0 48 85 d2 75 0e eb 2b 0f 1f 00 48 8b 52 10 48 85 d2
[236250.299180] RSP: 0018:ffffafffc219fbe0 EFLAGS: 00010202
[236250.299181] RAX: 0000000000000020 RBX: 0000000000000020 RCX: 0000000000000010
[236250.299182] RDX: ffff9d6db7a9b7c8 RSI: ffffffffffffffff RDI: 0000000000000020
[236250.299183] RBP: ffff9d72a12bf090 R08: ffffffffc1e0c2d0 R09: ffff9d72a12bf070
[236250.299184] R10: ffffffffc0ae1c10 R11: ffff9d72d0b88008 R12: 0000000000000020
[236250.299185] R13: 0000000000000000 R14: ffff9d72a12bf1f8 R15: ffff9d72a12bf338
[236250.299186] FS:  0000000000000000(0000) GS:ffff9d72d7b00000(0000) knlGS:0000000000000000
[236250.299187] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[236250.299188] CR2: 0000000000000020 CR3: 00000007cd834000 CR4: 00000000000006e0
[236250.299189] Call Trace:
[236250.299394]  ? _nv029878rm+0x1b/0x90 [nvidia]
[236250.299548]  ? _nv025417rm+0x18/0x60 [nvidia]
[236250.299702]  ? _nv011659rm+0x13d/0x1c0 [nvidia]
[236250.299879]  ? _nv000083rm+0x12f/0x1a0 [nvidia]
[236250.300079]  ? _nv029999rm+0xb9/0x330 [nvidia]
[236250.300278]  ? _nv029998rm+0x61/0x80 [nvidia]
[236250.300477]  ? _nv029998rm+0x37/0x80 [nvidia]
[236250.300669]  ? _nv011591rm+0x428/0x460 [nvidia]
[236250.300839]  ? _nv024702rm+0x251/0x3e0 [nvidia]
[236250.301034]  ? _nv024650rm+0x1f/0xf0 [nvidia]
[236250.301228]  ? _nv015407rm+0xcb/0x370 [nvidia]
[236250.301391]  ? _nv026019rm+0x10/0x10 [nvidia]
[236250.301584]  ? _nv027677rm+0x273/0xdc0 [nvidia]
[236250.301776]  ? _nv007561rm+0x155/0x270 [nvidia]
[236250.301969]  ? _nv027685rm+0x8d/0x180 [nvidia]
[236250.302113]  ? _nv000711rm+0xa9/0x200 [nvidia]
[236250.302117]  ? irq_forced_thread_fn+0x70/0x70
[236250.302261]  ? rm_isr_bh+0x1c/0x60 [nvidia]
[236250.302411]  ? nvidia_isr_kthread_bh+0x16/0x4d0 [nvidia]
[236250.302413]  ? irq_thread_fn+0x1b/0x60
[236250.302415]  ? irq_thread+0xd7/0x160
[236250.302416]  ? wake_threads_waitq+0x30/0x30
[236250.302418]  ? irq_thread_dtor+0x80/0x80
[236250.302420]  ? kthread+0x125/0x150
[236250.302422]  ? kthread_create_worker_on_cpu+0x60/0x60
[236250.302424]  ? ret_from_fork+0x22/0x40
[236250.302426] Modules linked in: fuse rfcomm cmac algif_hash algif_skcipher af_alg bnep ipv6 hid_logitech_hidpp btusb btrtl btbcm btintel bluetooth ecdh_generic nvidia_drm(PO) ax88179_178a rfkill ecc ch341 usbnet usbserial hid_logitech_dj uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videobuf2_common videodev hid_plantronics dm_mod nvidia_modeset(PO) joydev snd_hda_codec_hdmi snd_usb_audio snd_usbmidi_lib snd_rawmidi snd_seq_device mc wmi_bmof amd64_edac_mod snd_hda_codec_realtek snd_hda_codec_generic kvm_amd ledtrig_audio ccp kvm irqbypass pcspkr nvidia(PO) snd_hda_intel usbhid snd_intel_nhlt snd_hda_codec k10temp snd_hda_core i2c_piix4 snd_hwdep ohci_pci ohci_hcd snd_pcm snd_timer snd firewire_ohci asus_atk0110 soundcore ata_generic r8168(O) firewire_core pata_acpi hwmon wmi acpi_cpufreq button xhci_pci ehci_pci ahci ehci_hcd xhci_hcd libahci pata_jmicron usbcore libata
[236250.302455] CR2: 0000000000000020
[236250.302458] ---[ end trace 18df20f6c6a36c2e ]---
[236250.302615] RIP: 0010:_nv027470rm+0x9/0x90 [nvidia]
[236250.302617] Code: 90 ff e8 ea b0 00 00 31 c0 48 83 c4 08 c3 31 c0 eb bf 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 83 ec 08 48 85 ff 74 57 <48> 8b 17 31 c0 48 85 d2 75 0e eb 2b 0f 1f 00 48 8b 52 10 48 85 d2
[236250.302618] RSP: 0018:ffffafffc219fbe0 EFLAGS: 00010202
[236250.302619] RAX: 0000000000000020 RBX: 0000000000000020 RCX: 0000000000000010
[236250.302620] RDX: ffff9d6db7a9b7c8 RSI: ffffffffffffffff RDI: 0000000000000020
[236250.302621] RBP: ffff9d72a12bf090 R08: ffffffffc1e0c2d0 R09: ffff9d72a12bf070
[236250.302622] R10: ffffffffc0ae1c10 R11: ffff9d72d0b88008 R12: 0000000000000020
[236250.302623] R13: 0000000000000000 R14: ffff9d72a12bf1f8 R15: ffff9d72a12bf338
[236250.302624] FS:  0000000000000000(0000) GS:ffff9d72d7b00000(0000) knlGS:0000000000000000
[236250.302625] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[236250.302626] CR2: 0000000000000020 CR3: 00000007cd834000 CR4: 00000000000006e0
[236250.302654] BUG: kernel NULL pointer dereference, address: 0000000000000000
[236250.302655] #PF: supervisor instruction fetch in kernel mode
[236250.302656] #PF: error_code(0x0010) - not-present page
[236250.302657] PGD 7c24d3067 P4D 7c24d3067 PUD 0 
[236250.302659] Oops: 0010 [#2] PREEMPT SMP NOPTI
[236250.302661] CPU: 4 PID: 4768 Comm: irq/58-nvidia Tainted: P      D    O      5.4.72-gentoo-x86_64 #1
[236250.302662] Hardware name: System manufacturer System Product Name/M4A89TD PRO USB3, BIOS 3029    09/07/2012
[236250.302663] RIP: 0010:0x0
[236250.302666] Code: Bad RIP value.
[236250.302667] RSP: 0018:ffffafffc219fe98 EFLAGS: 00010282
[236250.302668] RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000000
[236250.302669] RDX: ffffafffc219fec8 RSI: 0000000000000000 RDI: ffffafffc219fec8
[236250.302670] RBP: ffff9d72d3d8cb10 R08: ffff9d72cffe60b0 R09: 0000000000000000
[236250.302671] R10: 0000000000000046 R11: ffffafffc219f93e R12: ffff9d72d3d8c480
[236250.302672] R13: ffffffffb37738b0 R14: 0000000000000000 R15: ffff9d72d3d8cb4c
[236250.302673] FS:  0000000000000000(0000) GS:ffff9d72d7b00000(0000) knlGS:0000000000000000
[236250.302674] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[236250.302675] CR2: ffffffffffffffd6 CR3: 00000007cd834000 CR4: 00000000000006e0
[236250.302676] Call Trace:
[236250.302678]  task_work_run+0x8e/0xb0
[236250.302682]  do_exit+0x342/0xab0
[236250.302684]  ? irq_thread_dtor+0x80/0x80
[236250.302690]  ? kthread+0x125/0x150
[236250.302695]  rewind_stack_do_exit+0x17/0x20
[236250.302698] RIP: 0000:0x0
[236250.302702] Code: Bad RIP value.
[236250.302705] RSP: 0000:0000000000000000 EFLAGS: 00000000 ORIG_RAX: 0000000000000000
[236250.302709] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
[236250.302713] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[236250.302716] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
[236250.302719] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[236250.302722] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[236250.302725] Modules linked in: fuse rfcomm cmac algif_hash algif_skcipher af_alg bnep ipv6 hid_logitech_hidpp btusb btrtl btbcm btintel bluetooth ecdh_generic nvidia_drm(PO) ax88179_178a rfkill ecc ch341 usbnet usbserial hid_logitech_dj uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videobuf2_common videodev hid_plantronics dm_mod nvidia_modeset(PO) joydev snd_hda_codec_hdmi snd_usb_audio snd_usbmidi_lib snd_rawmidi snd_seq_device mc wmi_bmof amd64_edac_mod snd_hda_codec_realtek snd_hda_codec_generic kvm_amd ledtrig_audio ccp kvm irqbypass pcspkr nvidia(PO) snd_hda_intel usbhid snd_intel_nhlt snd_hda_codec k10temp snd_hda_core i2c_piix4 snd_hwdep ohci_pci ohci_hcd snd_pcm snd_timer snd firewire_ohci asus_atk0110 soundcore ata_generic r8168(O) firewire_core pata_acpi hwmon wmi acpi_cpufreq button xhci_pci ehci_pci ahci ehci_hcd xhci_hcd libahci pata_jmicron usbcore libata
[236250.302782] CR2: 0000000000000000
[236250.302783] ---[ end trace 18df20f6c6a36c2f ]---
[236250.302945] RIP: 0010:_nv027470rm+0x9/0x90 [nvidia]
[236250.302950] Code: 90 ff e8 ea b0 00 00 31 c0 48 83 c4 08 c3 31 c0 eb bf 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 83 ec 08 48 85 ff 74 57 <48> 8b 17 31 c0 48 85 d2 75 0e eb 2b 0f 1f 00 48 8b 52 10 48 85 d2
[236250.302951] RSP: 0018:ffffafffc219fbe0 EFLAGS: 00010202
[236250.302952] RAX: 0000000000000020 RBX: 0000000000000020 RCX: 0000000000000010
[236250.302953] RDX: ffff9d6db7a9b7c8 RSI: ffffffffffffffff RDI: 0000000000000020
[236250.302954] RBP: ffff9d72a12bf090 R08: ffffffffc1e0c2d0 R09: ffff9d72a12bf070
[236250.302955] R10: ffffffffc0ae1c10 R11: ffff9d72d0b88008 R12: 0000000000000020
[236250.302956] R13: 0000000000000000 R14: ffff9d72a12bf1f8 R15: ffff9d72a12bf338
[236250.302957] FS:  0000000000000000(0000) GS:ffff9d72d7b00000(0000) knlGS:0000000000000000
[236250.302958] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[236250.302959] CR2: ffffffffffffffd6 CR3: 00000007cd834000 CR4: 00000000000006e0
[236250.302960] Fixing recursive fault but reboot is needed!

Been having this issue for a couple months now as well, intermittently, and just found this thread. Happens once in a blue moon, and always when using Google Chrome (I believe always when changing a tab). Like others, I had to hard reset to get the system working again.

R7 3700X w/ Asrock x570 taichi
ArchLinux
NVIDIA driver 455.38
GTX 1080
Linux version 5.9.8-arch1-1 (linux@archlinux) (gcc (GCC) 10.2.0, GNU ld (GNU Binutils) 2.35.1) #1 SMP PREEMPT Tue, 10 Nov 2020 22:44:11 +0000

Here’s my latest:

kernel: BUG: kernel NULL pointer dereference, address: 0000000000000020
kernel: #PF: supervisor read access in kernel mode
kernel: #PF: error_code(0x0000) - not-present page
kernel: PGD 3f361a067 P4D 3f361a067 PUD 0
kernel: Oops: 0000 [#1] PREEMPT SMP NOPTI
kernel: CPU: 0 PID: 743 Comm: irq/99-nvidia Tainted: P           OE     5.9.8-arch1-1 #1
kernel: Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./X570 Taichi, BIOS P2.10 09/09/2019
kernel: RIP: 0010:_nv027510rm+0x9/0x90 [nvidia]
kernel: Code: 90 ff e8 ea b0 00 00 31 c0 48 83 c4 08 c3 31 c0 eb bf 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 83 ec 08 48 85 ff 74 57 <48> 8b 17 31 c0 48 85 d2 75 0e eb 2b 0f 1f 00 48 8b 52 10 48 85 d2
kernel: RSP: 0018:ffffb5fa416a3be0 EFLAGS: 00010202
kernel: RAX: 0000000000000020 RBX: 0000000000000020 RCX: 0000000000000010
kernel: RDX: ffff913dd126b8c8 RSI: ffffffffffffffff RDI: 0000000000000020
kernel: RBP: ffff9140ab3ca940 R08: ffffffffc3638490 R09: ffff9140ab3ca920
kernel: R10: ffffffffc2306230 R11: ffff9140b5ea8008 R12: 0000000000000020
kernel: R13: 0000000000000000 R14: ffff9140ab3caaa8 R15: ffff9140ab3cabb0
kernel: FS:  0000000000000000(0000) GS:ffff9140cea00000(0000) knlGS:0000000000000000
kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
kernel: CR2: 0000000000000020 CR3: 000000030a188000 CR4: 0000000000350ef0
kernel: Call Trace:
kernel:  ? _nv029921rm+0x1b/0x90 [nvidia]
kernel:  ? _nv025455rm+0x18/0x60 [nvidia]
kernel:  ? _nv011677rm+0x13d/0x1c0 [nvidia]
kernel:  ? _nv000083rm+0x12f/0x1a0 [nvidia]
kernel:  ? _nv011605rm+0xff/0x180 [nvidia]
kernel:  ? _nv018432rm+0x1af/0x210 [nvidia]
kernel:  ? _nv018372rm+0xd9a/0xe90 [nvidia]
kernel:  ? _nv018373rm+0xde/0x260 [nvidia]
kernel:  ? _nv018339rm+0x72/0xc0 [nvidia]
kernel:  ? _nv018353rm+0x235/0x2d0 [nvidia]
kernel:  ? _nv026057rm+0x10/0x10 [nvidia]
kernel:  ? _nv018386rm+0xac/0xe0 [nvidia]
kernel:  ? _nv027717rm+0x820/0xdc0 [nvidia]
kernel:  ? _nv007563rm+0x155/0x270 [nvidia]
kernel:  ? _nv027725rm+0x8d/0x180 [nvidia]
kernel:  ? _nv000712rm+0xa9/0x200 [nvidia]
kernel:  ? disable_irq_nosync+0x10/0x10
kernel:  ? rm_isr_bh+0x1c/0x60 [nvidia]
kernel:  ? nvidia_isr_kthread_bh+0x1b/0x40 [nvidia]
kernel:  ? irq_thread_fn+0x20/0x60
kernel:  ? irq_thread+0xf5/0x1a0
kernel:  ? irq_finalize_oneshot.part.0+0xe0/0xe0
kernel:  ? irq_thread_check_affinity+0xd0/0xd0
kernel:  ? kthread+0x142/0x160
kernel:  ? __kthread_bind_mask+0x60/0x60
kernel:  ? ret_from_fork+0x22/0x30
kernel: Modules linked in: rfcomm fuse uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 snd_usb_audio videobuf2_common videodev snd_usbmidi_lib snd_rawmidi nvidia_drm(POE) snd_seq_device mc nvidia_modeset(POE) cma>
kernel:  pinctrl_amd acpi_cpufreq vboxnetflt(OE) vboxnetadp(OE) drm vboxdrv(OE) crypto_user agpgart ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd2 crc32c_intel xhci_pci xhci_pci_renesas xhci_hcd
kernel: CR2: 0000000000000020
kernel: ---[ end trace e1fd299d7b82857d ]---
kernel: RIP: 0010:_nv027510rm+0x9/0x90 [nvidia]
kernel: Code: 90 ff e8 ea b0 00 00 31 c0 48 83 c4 08 c3 31 c0 eb bf 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 83 ec 08 48 85 ff 74 57 <48> 8b 17 31 c0 48 85 d2 75 0e eb 2b 0f 1f 00 48 8b 52 10 48 85 d2
kernel: RSP: 0018:ffffb5fa416a3be0 EFLAGS: 00010202
kernel: RAX: 0000000000000020 RBX: 0000000000000020 RCX: 0000000000000010
kernel: RDX: ffff913dd126b8c8 RSI: ffffffffffffffff RDI: 0000000000000020
kernel: RBP: ffff9140ab3ca940 R08: ffffffffc3638490 R09: ffff9140ab3ca920
kernel: R10: ffffffffc2306230 R11: ffff9140b5ea8008 R12: 0000000000000020
kernel: R13: 0000000000000000 R14: ffff9140ab3caaa8 R15: ffff9140ab3cabb0
kernel: FS:  0000000000000000(0000) GS:ffff9140cea00000(0000) knlGS:0000000000000000
kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
kernel: CR2: 0000000000000020 CR3: 000000030a188000 CR4: 0000000000350ef0
kernel: BUG: kernel NULL pointer dereference, address: 0000000000000930
kernel: #PF: supervisor write access in kernel mode
kernel: #PF: error_code(0x0002) - not-present page
kernel: PGD 3f361a067 P4D 3f361a067 PUD 0
kernel: Oops: 0002 [#2] PREEMPT SMP NOPTI
kernel: CPU: 0 PID: 743 Comm: irq/99-nvidia Tainted: P      D    OE     5.9.8-arch1-1 #1
kernel: Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./X570 Taichi, BIOS P2.10 09/09/2019
kernel: RIP: 0010:mutex_lock+0x10/0x20
kernel: Code: 03 31 c0 c3 eb d4 0f 1f 40 00 0f 1f 44 00 00 be 02 00 00 00 e9 61 fa ff ff 90 0f 1f 44 00 00 31 c0 65 48 8b 14 25 c0 7b 01 00 <f0> 48 0f b1 17 75 01 c3 eb d6 66 0f 1f 44 00 00 0f 1f 44 00 00 41
kernel: RSP: 0018:ffffb5fa416a3e30 EFLAGS: 00010246
kernel: RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
kernel: RDX: ffff9140acd99e80 RSI: 0000000000000000 RDI: 0000000000000930
kernel: RBP: 0000000000000930 R08: 0000000000000001 R09: 0000000000000000
kernel: R10: ffff9140c2fbf800 R11: ffffb5fa416a3800 R12: ffff9140acd9a64c
kernel: R13: 0000000000000000 R14: 0000000000000001 R15: ffff9140acd99e80
kernel: FS:  0000000000000000(0000) GS:ffff9140cea00000(0000) knlGS:0000000000000000
kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
kernel: CR2: 0000000000000930 CR3: 000000030a188000 CR4: 0000000000350ef0
kernel: Call Trace:
kernel:  perf_event_exit_task+0x30/0x440
kernel:  ? kfree+0x40f/0x440
kernel:  do_exit+0x379/0xa90
kernel:  ? task_work_run+0x5c/0x90
kernel:  ? do_exit+0x369/0xa90
kernel:  ? kthread+0x142/0x160
kernel:  ? rewind_stack_do_exit+0x17/0x17
kernel: Modules linked in: rfcomm fuse uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 snd_usb_audio videobuf2_common videodev snd_usbmidi_lib snd_rawmidi nvidia_drm(POE) snd_seq_device mc nvidia_modeset(POE) cma>
kernel:  pinctrl_amd acpi_cpufreq vboxnetflt(OE) vboxnetadp(OE) drm vboxdrv(OE) crypto_user agpgart ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd2 crc32c_intel xhci_pci xhci_pci_renesas xhci_hcd
kernel: CR2: 0000000000000930
kernel: ---[ end trace e1fd299d7b82857e ]---
kernel: RIP: 0010:_nv027510rm+0x9/0x90 [nvidia]
kernel: Code: 90 ff e8 ea b0 00 00 31 c0 48 83 c4 08 c3 31 c0 eb bf 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 83 ec 08 48 85 ff 74 57 <48> 8b 17 31 c0 48 85 d2 75 0e eb 2b 0f 1f 00 48 8b 52 10 48 85 d2
kernel: RSP: 0018:ffffb5fa416a3be0 EFLAGS: 00010202
kernel: RAX: 0000000000000020 RBX: 0000000000000020 RCX: 0000000000000010
kernel: RDX: ffff913dd126b8c8 RSI: ffffffffffffffff RDI: 0000000000000020
kernel: RBP: ffff9140ab3ca940 R08: ffffffffc3638490 R09: ffff9140ab3ca920
kernel: R10: ffffffffc2306230 R11: ffff9140b5ea8008 R12: 0000000000000020
kernel: R13: 0000000000000000 R14: ffff9140ab3caaa8 R15: ffff9140ab3cabb0
kernel: FS:  0000000000000000(0000) GS:ffff9140cea00000(0000) knlGS:0000000000000000
kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
kernel: CR2: 0000000000000930 CR3: 000000030a188000 CR4: 0000000000350ef0
kernel: Fixing recursive fault but reboot is needed!

If you believe open source AMD drivers are bug free you’re quite wrong about that. Some of their bugs have been nagging users for months, if not years without any fixes.

I welcome you to inspect: FreeDesktop Bugzilla and kernel bugzilla.

You’ll be surprised how many old bugs exist with zero feedback from AMD. NVIDIA at least is trying to reproduce the issue.

Lastly enterprise customers are not affected by this bug. You cannot imagine how much value is in there and how diligently NVIDIA attends to their needs. Home/desktop users on the other hand? A whole different story.

1 Like

How so? The bug seems to affect some general mechanism, so it’s just as likely to affect enterprise users. Enterprise users may not be as likely to identify and report all instances of crashes, but that does not mean that they are unaffected.