Bug report: 455.23.04 - Kernel Panic due to NULL pointer dereference

Hi All, We have been not able to reproduce issue locally but after analyzing logs and reported incidents, we have probably root caused the issue and working towards fix. Please allow us some more time to debug it further and will get back with more updates

5 Likes

Same issue here, twice so far.

5.10.2-2-MANJARO #1 SMP PREEMPT Tue Dec 22 08:14:42 UTC 2020 x86_64 GNU/Linux
NVidia driver 455.45.01
Asrock X570 Taichi
NVidia 1080TI

Seemed to be triggered when using Chromium. Both times whilst browsing google drive.

1 Like

Got hit by this bug again yesterday on the 5.4 LTS kernel.

Itā€™s really frustrating that nvidia-bug-report.sh hangs even with --safe-mode, which renders it useless.

Since you guys seem to be working on a fix already, please find out why the script hangs.

Hello.
Same issue here.

On the 455 series drivers this same error occurred.
The error occurs only in chromium based browsers.
Slackware current

NVIDIA-Linux-x86_64-460.32.03.run

Jan 13 01:45:47 slack-pc kernel: [21988.136139] BUG: kernel NULL pointer dereference, address: 0000000000000020
Jan 13 01:45:47 slack-pc kernel: [21988.136146] #PF: supervisor read access in kernel mode
Jan 13 01:45:47 slack-pc kernel: [21988.136148] #PF: error_code(0x0000) - not-present page
Jan 13 01:45:47 slack-pc kernel: [21988.136155] Oops: 0000 [#1] PREEMPT SMP PTI
Jan 13 01:45:47 slack-pc kernel: [21988.136158] CPU: 0 PID: 1450 Comm: irq/32-nvidia Tainted: P           O      5.10.7 #1
Jan 13 01:45:47 slack-pc kernel: [21988.136160] Hardware name: POSITIVO POS-EIH61CE/POS-EIH61CE, BIOS 4.6.5 10/18/2012
Jan 13 01:45:47 slack-pc kernel: [21988.136427] RIP: 0010:_nv028498rm+0x9/0x90 [nvidia]
Jan 13 01:45:47 slack-pc kernel: [21988.136431] Code: 8e ff e8 8a af 00 00 31 c0 48 83 c4 08 c3 31 c0 eb bf 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 83 ec 08 48 85 ff 74 57 <48> 8b 17 31 c0 48 85 d2 75 0e eb 2b 0f 1f 00 48 8b 52 10 48 85 d2
Jan 13 01:45:47 slack-pc kernel: [21988.136433] RSP: 0018:ffffab2440fa3bf0 EFLAGS: 00010202
Jan 13 01:45:47 slack-pc kernel: [21988.136436] RAX: 0000000000000020 RBX: 0000000000000020 RCX: 0000000000000010
Jan 13 01:45:47 slack-pc kernel: [21988.136438] RDX: ffff90868cb87c48 RSI: ffffffffffffffff RDI: 0000000000000020
Jan 13 01:45:47 slack-pc kernel: [21988.136440] RBP: ffff9086655d2960 R08: ffffffffc2660b60 R09: ffff9086655d2940
Jan 13 01:45:47 slack-pc kernel: [21988.136441] R10: ffff9086655a4008 R11: ffff9086655a5098 R12: 0000000000000020
Jan 13 01:45:47 slack-pc kernel: [21988.136443] R13: 0000000000000000 R14: ffff9086655d2ac8 R15: ffff9086655d2bd0
Jan 13 01:45:47 slack-pc kernel: [21988.136446] FS:  0000000000000000(0000) GS:ffff90894ec00000(0000) knlGS:0000000000000000
Jan 13 01:45:47 slack-pc kernel: [21988.136448] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jan 13 01:45:47 slack-pc kernel: [21988.136449] CR2: 0000000000000020 CR3: 000000012e330002 CR4: 00000000001706f0
Jan 13 01:45:47 slack-pc kernel: [21988.136450] Call Trace:
Jan 13 01:45:47 slack-pc kernel: [21988.136678]  ? _nv030766rm+0x1b/0x90 [nvidia]
Jan 13 01:45:47 slack-pc kernel: [21988.136868]  ? _nv026432rm+0x18/0x60 [nvidia]
Jan 13 01:45:47 slack-pc kernel: [21988.137031]  ? _nv012979rm+0x13d/0x1c0 [nvidia]
Jan 13 01:45:47 slack-pc kernel: [21988.137179]  ? _nv000081rm+0x12f/0x1a0 [nvidia]
Jan 13 01:45:47 slack-pc kernel: [21988.137384]  ? _nv012910rm+0xff/0x180 [nvidia]
Jan 13 01:45:47 slack-pc kernel: [21988.137599]  ? _nv019531rm+0x1af/0x210 [nvidia]
Jan 13 01:45:47 slack-pc kernel: [21988.137789]  ? _nv019482rm+0xdf3/0xef0 [nvidia]
Jan 13 01:45:47 slack-pc kernel: [21988.138002]  ? _nv019483rm+0xf3/0x290 [nvidia]
Jan 13 01:45:47 slack-pc kernel: [21988.138209]  ? _nv019449rm+0x78/0xd0 [nvidia]
Jan 13 01:45:47 slack-pc kernel: [21988.138404]  ? _nv019463rm+0xcf/0x2f0 [nvidia]
Jan 13 01:45:47 slack-pc kernel: [21988.138595]  ? _nv019497rm+0xbe/0xe0 [nvidia]
Jan 13 01:45:47 slack-pc kernel: [21988.138808]  ? _nv028705rm+0x97b/0xdc0 [nvidia]
Jan 13 01:45:47 slack-pc kernel: [21988.139032]  ? _nv028713rm+0x15d/0x400 [nvidia]
Jan 13 01:45:47 slack-pc kernel: [21988.139187]  ? _nv000709rm+0xa9/0x240 [nvidia]
Jan 13 01:45:47 slack-pc kernel: [21988.139192]  ? disable_irq_nosync+0x10/0x10
Jan 13 01:45:47 slack-pc kernel: [21988.139330]  ? rm_isr_bh+0x1c/0x60 [nvidia]
Jan 13 01:45:47 slack-pc kernel: [21988.139420]  ? nvidia_isr_kthread_bh+0x1b/0x40 [nvidia]
Jan 13 01:45:47 slack-pc kernel: [21988.139422]  ? irq_thread_fn+0x20/0x60
Jan 13 01:45:47 slack-pc kernel: [21988.139423]  ? irq_thread+0xe3/0x190
Jan 13 01:45:47 slack-pc kernel: [21988.139425]  ? irq_finalize_oneshot.part.0+0xd0/0xd0
Jan 13 01:45:47 slack-pc kernel: [21988.139427]  ? irq_thread_check_affinity+0xa0/0xa0
Jan 13 01:45:47 slack-pc kernel: [21988.139429]  ? kthread+0x142/0x160
Jan 13 01:45:47 slack-pc kernel: [21988.139430]  ? __kthread_bind_mask+0x60/0x60
Jan 13 01:45:47 slack-pc kernel: [21988.139432]  ? ret_from_fork+0x22/0x30
Jan 13 01:45:47 slack-pc kernel: [21988.139434] Modules linked in: nvidia_uvm(PO) fuse lz4 zram nf_log_ipv4 nf_log_common ipt_REJECT nf_reject_ipv4 xt_LOG xt_limit xt_addrtype xt_tcpudp xt_conntrack ip6_tables nf_conntrack_netbios_ns nf_conntrack_broadcast nf_nat_ftp nf_nat nf_conntrack_ftp nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_filter ip_tables x_tables efivarfs it87 hwmon_vid nvidia_drm(PO) nvidia_modeset(PO) hid_generic usbhid hid nvidia(PO) snd_hda_codec_via snd_hda_codec_generic ledtrig_audio snd_hda_codec_hdmi snd_hda_intel snd_intel_dspcfg soundwire_intel soundwire_generic_allocation drm_kms_helper at24 intel_rapl_msr snd_soc_core regmap_i2c drm mei_hdcp snd_compress intel_rapl_common snd_pcm_dmaengine x86_pkg_temp_thermal intel_powerclamp agpgart fb_sys_fops coretemp syscopyarea soundwire_cadence gpio_ich snd_hda_codec snd_hda_core sysfillrect snd_hwdep snd_pcm kvm_intel sysimgblt snd_timer kvm snd i2c_i801 soundcore irqbypass evdev mei_me i2c_smbus ac97_bus crct10dif_pclmul crc32_pclmul mei
Jan 13 01:45:47 slack-pc kernel: [21988.139475]  ghash_clmulni_intel serio_raw i2c_core rapl bfq intel_cstate ehci_pci atl1c lpc_ich ehci_hcd thermal video fan button wmi loop
Jan 13 01:45:47 slack-pc kernel: [21988.139485] CR2: 0000000000000020
Jan 13 01:45:47 slack-pc kernel: [21988.139488] ---[ end trace 0fbb305080b82bf1 ]---
Jan 13 01:45:47 slack-pc kernel: [21988.139638] RIP: 0010:_nv028498rm+0x9/0x90 [nvidia]
Jan 13 01:45:47 slack-pc kernel: [21988.139640] Code: 8e ff e8 8a af 00 00 31 c0 48 83 c4 08 c3 31 c0 eb bf 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 83 ec 08 48 85 ff 74 57 <48> 8b 17 31 c0 48 85 d2 75 0e eb 2b 0f 1f 00 48 8b 52 10 48 85 d2
Jan 13 01:45:47 slack-pc kernel: [21988.139642] RSP: 0018:ffffab2440fa3bf0 EFLAGS: 00010202
Jan 13 01:45:47 slack-pc kernel: [21988.139643] RAX: 0000000000000020 RBX: 0000000000000020 RCX: 0000000000000010
Jan 13 01:45:47 slack-pc kernel: [21988.139644] RDX: ffff90868cb87c48 RSI: ffffffffffffffff RDI: 0000000000000020
Jan 13 01:45:47 slack-pc kernel: [21988.139645] RBP: ffff9086655d2960 R08: ffffffffc2660b60 R09: ffff9086655d2940
Jan 13 01:45:47 slack-pc kernel: [21988.139646] R10: ffff9086655a4008 R11: ffff9086655a5098 R12: 0000000000000020
Jan 13 01:45:47 slack-pc kernel: [21988.139647] R13: 0000000000000000 R14: ffff9086655d2ac8 R15: ffff9086655d2bd0
Jan 13 01:45:47 slack-pc kernel: [21988.139648] FS:  0000000000000000(0000) GS:ffff90894ec00000(0000) knlGS:0000000000000000
Jan 13 01:45:47 slack-pc kernel: [21988.139650] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jan 13 01:45:47 slack-pc kernel: [21988.139651] CR2: 0000000000000020 CR3: 000000012e330002 CR4: 00000000001706f0
Jan 13 01:45:47 slack-pc kernel: [21988.139675] BUG: kernel NULL pointer dereference, address: 0000000000000000
Jan 13 01:45:47 slack-pc kernel: [21988.139678] #PF: supervisor instruction fetch in kernel mode
Jan 13 01:45:47 slack-pc kernel: [21988.139679] #PF: error_code(0x0010) - not-present page
Jan 13 01:45:47 slack-pc kernel: [21988.139684] Oops: 0010 [#2] PREEMPT SMP PTI
Jan 13 01:45:47 slack-pc kernel: [21988.139687] CPU: 0 PID: 1450 Comm: irq/32-nvidia Tainted: P      D    O      5.10.7 #1
Jan 13 01:45:47 slack-pc kernel: [21988.139688] Hardware name: POSITIVO POS-EIH61CE/POS-EIH61CE, BIOS 4.6.5 10/18/2012
Jan 13 01:45:47 slack-pc kernel: [21988.139690] RIP: 0010:0x0
Jan 13 01:45:47 slack-pc kernel: [21988.139693] Code: Unable to access opcode bytes at RIP 0xffffffffffffffd6.
Jan 13 01:45:47 slack-pc kernel: [21988.139694] RSP: 0018:ffffab2440fa3ec0 EFLAGS: 00010286
Jan 13 01:45:47 slack-pc kernel: [21988.139720] RAX: 0000000000000000 RBX: ffff90864993ba00 RCX: 0000000000000000
Jan 13 01:45:47 slack-pc kernel: [21988.139721] RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffffab2440fa3ec8
Jan 13 01:45:47 slack-pc kernel: [21988.139722] RBP: ffff90864993ba00 R08: 0000000000000046 R09: ffffab2440fa38b0
Jan 13 01:45:47 slack-pc kernel: [21988.139724] R10: ffffab2440fa38a8 R11: ffffffffa1d37668 R12: ffff90864993c11c
Jan 13 01:45:47 slack-pc kernel: [21988.139725] R13: 0000000000000020 R14: 0000000000000001 R15: ffff90864993ba00
Jan 13 01:45:47 slack-pc kernel: [21988.139748] FS:  0000000000000000(0000) GS:ffff90894ec00000(0000) knlGS:0000000000000000
Jan 13 01:45:47 slack-pc kernel: [21988.139753] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jan 13 01:45:47 slack-pc kernel: [21988.139757] CR2: ffffffffffffffd6 CR3: 000000012e330002 CR4: 00000000001706f0
Jan 13 01:45:47 slack-pc kernel: [21988.139760] Call Trace:
Jan 13 01:45:47 slack-pc kernel: [21988.139767]  task_work_run+0x5c/0x90
Jan 13 01:45:47 slack-pc kernel: [21988.139774]  do_exit+0x333/0xa30
Jan 13 01:45:47 slack-pc kernel: [21988.139779]  ? irq_thread_check_affinity+0xa0/0xa0
Jan 13 01:45:47 slack-pc kernel: [21988.139780]  ? kthread+0x142/0x160
Jan 13 01:45:47 slack-pc kernel: [21988.139782]  rewind_stack_do_exit+0x17/0x17
Jan 13 01:45:47 slack-pc kernel: [21988.139784] RIP: 0000:0x0
Jan 13 01:45:47 slack-pc kernel: [21988.139785] Code: Unable to access opcode bytes at RIP 0xffffffffffffffd6.
Jan 13 01:45:47 slack-pc kernel: [21988.139786] RSP: 0000:0000000000000000 EFLAGS: 00000000 ORIG_RAX: 0000000000000000
Jan 13 01:45:47 slack-pc kernel: [21988.139788] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
Jan 13 01:45:47 slack-pc kernel: [21988.139789] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
Jan 13 01:45:47 slack-pc kernel: [21988.139790] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
Jan 13 01:45:47 slack-pc kernel: [21988.139791] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
Jan 13 01:45:47 slack-pc kernel: [21988.139791] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
Jan 13 01:45:47 slack-pc kernel: [21988.139793] Modules linked in: nvidia_uvm(PO) fuse lz4 zram nf_log_ipv4 nf_log_common ipt_REJECT nf_reject_ipv4 xt_LOG xt_limit xt_addrtype xt_tcpudp xt_conntrack ip6_tables nf_conntrack_netbios_ns nf_conntrack_broadcast nf_nat_ftp nf_nat nf_conntrack_ftp nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_filter ip_tables x_tables efivarfs it87 hwmon_vid nvidia_drm(PO) nvidia_modeset(PO) hid_generic usbhid hid nvidia(PO) snd_hda_codec_via snd_hda_codec_generic ledtrig_audio snd_hda_codec_hdmi snd_hda_intel snd_intel_dspcfg soundwire_intel soundwire_generic_allocation drm_kms_helper at24 intel_rapl_msr snd_soc_core regmap_i2c drm mei_hdcp snd_compress intel_rapl_common snd_pcm_dmaengine x86_pkg_temp_thermal intel_powerclamp agpgart fb_sys_fops coretemp syscopyarea soundwire_cadence gpio_ich snd_hda_codec snd_hda_core sysfillrect snd_hwdep snd_pcm kvm_intel sysimgblt snd_timer kvm snd i2c_i801 soundcore irqbypass evdev mei_me i2c_smbus ac97_bus crct10dif_pclmul crc32_pclmul mei
Jan 13 01:45:47 slack-pc kernel: [21988.139824]  ghash_clmulni_intel serio_raw i2c_core rapl bfq intel_cstate ehci_pci atl1c lpc_ich ehci_hcd thermal video fan button wmi loop
Jan 13 01:45:47 slack-pc kernel: [21988.139832] CR2: 0000000000000000
Jan 13 01:45:47 slack-pc kernel: [21988.139834] ---[ end trace 0fbb305080b82bf2 ]---
Jan 13 01:45:47 slack-pc kernel: [21988.140007] RIP: 0010:_nv028498rm+0x9/0x90 [nvidia]
Jan 13 01:45:47 slack-pc kernel: [21988.140033] Code: 8e ff e8 8a af 00 00 31 c0 48 83 c4 08 c3 31 c0 eb bf 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 83 ec 08 48 85 ff 74 57 <48> 8b 17 31 c0 48 85 d2 75 0e eb 2b 0f 1f 00 48 8b 52 10 48 85 d2
Jan 13 01:45:47 slack-pc kernel: [21988.140036] RSP: 0018:ffffab2440fa3bf0 EFLAGS: 00010202
Jan 13 01:45:47 slack-pc kernel: [21988.140042] RAX: 0000000000000020 RBX: 0000000000000020 RCX: 0000000000000010
Jan 13 01:45:47 slack-pc kernel: [21988.140046] RDX: ffff90868cb87c48 RSI: ffffffffffffffff RDI: 0000000000000020
Jan 13 01:45:47 slack-pc kernel: [21988.140049] RBP: ffff9086655d2960 R08: ffffffffc2660b60 R09: ffff9086655d2940
Jan 13 01:45:47 slack-pc kernel: [21988.140051] R10: ffff9086655a4008 R11: ffff9086655a5098 R12: 0000000000000020
Jan 13 01:45:47 slack-pc kernel: [21988.140052] R13: 0000000000000000 R14: ffff9086655d2ac8 R15: ffff9086655d2bd0
Jan 13 01:45:47 slack-pc kernel: [21988.140053] FS:  0000000000000000(0000) GS:ffff90894ec00000(0000) knlGS:0000000000000000
Jan 13 01:45:47 slack-pc kernel: [21988.140054] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jan 13 01:45:47 slack-pc kernel: [21988.140055] CR2: ffffffffffffffd6 CR3: 000000012e330002 CR4: 00000000001706f0
Jan 13 01:45:47 slack-pc kernel: [21988.140056] Fixing recursive fault but reboot is needed!

Thanks for listening.

@zezaocapoeira, did the bug occured on 455.xx or 460.32.03 driver version ?

Hello.

@al.piotrowicz

This error occurred in those versions I used.

-NVIDIA-Linux-x86_64-455.23.04.run
-NVIDIA-Linux-x86_64-455.28.run
-NVIDIA-Linux-x86_64-455.38.run
-NVIDIA-Linux-x86_64-455.45.01.run
-NVIDIA-Linux-x86_64-460.27.04.run
-NVIDIA-Linux-x86_64-460.32.03.run (currently installed version)

In my case, this error usually occurs in high uptimes 2 days + , when using chromium based browsers .

Thanks for listening.

Thanks for your fast reply @zezaocapoeira. Thatā€™s one of their 460.32.03 changelog subrecords:

Improved the memory allocation strategy in nvidia-modeset.ko to reduce the likelihood of out-of-memory errors, which typically manifest as ā€œpage allocation failureā€ messages in the kernel log.

I dunno its related to the issue.

For me the bug also triggers only when using a chrome based browsers.

Hello.

@al.piotrowicz

I checked the google-chrome-stable and vivaldi-3.5.2115.87.1 logs

Error regarding :

  • ā€¦Skia shader compilation errorā€¦

  • ā€¦ Program binary could not be loaded. Binary is not compatible with current driver/hardware combinationā€¦

And both have the same error output:

...
Errors:
Program binary could not be loaded. Binary is not compatible with current driver/hardware combination. Driver build date Dec 27 2020. Please check build information of source that generated the binary.

[8305:8305:0113/152317.135951:ERROR:shared_context_state.cc(74)] Skia shader compilation error
------------------------

Errors:
Program binary could not be loaded. Binary is not compatible with current driver/hardware combination. Driver build date Dec 27 2020. Please check build information of source that generated the binary.

[8270:8270:0113/152317.148268:ERROR:CONSOLE(0)] "Unchecked runtime.lastError: The message port closed before a response was received.", source: chrome-extension://mpognobbkildjkofajifpdfhcoklimli/browser.html (0)
[8305:8305:0113/152317.391085:ERROR:shared_context_state.cc(74)] Skia shader compilation error
------------------------

Errors:
Program binary could not be loaded. Binary is not compatible with current driver/hardware combination. Driver build date Dec 27 2020. Please check build information of source that generated the binary.

[8305:8305:0113/152317.401681:ERROR:shared_context_state.cc(74)] Skia shader compilation error
------------------------

Errors:
Program binary could not be loaded. Binary is not compatible with current driver/hardware combination. Driver build date Dec 27 2020. Please check build information of source that generated the binary.

[8305:8305:0113/152317.413164:ERROR:shared_context_state.cc(74)] Skia shader compilation error
------------------------

Errors:
Program binary could not be loaded. Binary is not compatible with current driver/hardware combination. Driver build date Dec 27 2020. Please check build information of source that generated the binary.

[8305:8305:0113/152317.424093:ERROR:shared_context_state.cc(74)] Skia shader compilation error
------------------------

Errors:
Program binary could not be loaded. Binary is not compatible with current driver/hardware combination. Driver build date Dec 27 2020. Please check build information of source that generated the binary.

[8305:8305:0113/152317.429104:ERROR:shared_context_state.cc(74)] Skia shader compilation error
------------------------

Errors:
Program binary could not be loaded. Binary is not compatible with current driver/hardware combination. Driver build date Dec 27 2020. Please check build information of source that generated the binary.

[8305:8305:0113/152317.434329:ERROR:shared_context_state.cc(74)] Skia shader compilation error
------------------------

Errors:
Program binary could not be loaded. Binary is not compatible with current driver/hardware combination. Driver build date Dec 27 2020. Please check build information of source that generated the binary.

[8305:8305:0113/152317.452881:ERROR:shared_context_state.cc(74)] Skia shader compilation error
------------------------
...

Thanks for listening.

@zezaocapoeira please try hwaccel in the browser by enabling the flags and add an exec switch to the launcher:

/usr/lib/chromium/chromium --use-gl=desktop (--use-gl=egl in the case of wayland)

FLAGS:

  • ā€“enable-accelerated-video-decode
  • ā€“enable-experimental-webassembly-features
  • ā€“enable-gpu-rasterization
  • ā€“enable-webgl-draft-extensions
  • ā€“enable-webgl2-compute-context
  • ā€“enable-zero-copy
  • ā€“ignore-gpu-blocklist
  • ā€“disable-smooth-scrolling

Also please try using the latest nvidia driver 460.32.03

@al.piotrowicz

With the flag --use-gl=desktop , there were no errors from the previous logs

$ vivaldi --use-gl=desktop --enable-accelerated-video-decode --enable-experimental-webassembly-features --enable-gpu-rasterization --enable-webgl-draft-extensions --enable-webgl2-compute-context --enable-zero-copy --ignore-gpu-blocklist --disable-smooth-scrolling

vivaldi

$ google-chrome --use-gl=desktop --enable-accelerated-video-decode --enable-experimental-webassembly-features --enable-gpu-rasterization --enable-webgl-draft-extensions --enable-webgl2-compute-context --enable-zero-copy --ignore-gpu-blocklist --disable-smooth-scrolling

chrome

Thanks for listening.

This still happens for me on 460.32.03-2 on kernel 5.10.6

Jan 13 13:10:49 scout kernel: BUG: kernel NULL pointer dereference, address: 0000000000000020
Jan 13 13:10:49 scout kernel: #PF: supervisor read access in kernel mode
Jan 13 13:10:49 scout kernel: #PF: error_code(0x0000) - not-present page
Jan 13 13:10:49 scout kernel: PGD 80000001330fb067 P4D 80000001330fb067 PUD 0 
Jan 13 13:10:49 scout kernel: Oops: 0000 [#1] PREEMPT SMP PTI
Jan 13 13:10:49 scout kernel: CPU: 4 PID: 605 Comm: irq/51-nvidia Tainted: P           OE     5.10.6-arch1-1 #1
Jan 13 13:10:49 scout kernel: Hardware name: System manufacturer System Product Name/MAXIMUS V GENE, BIOS 0701 03/29/2012
Jan 13 13:10:49 scout kernel: RIP: 0010:_nv028498rm+0x9/0x90 [nvidia]
Jan 13 13:10:49 scout kernel: Code: 8e ff e8 8a af 00 00 31 c0 48 83 c4 08 c3 31 c0 eb bf 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 83 ec 08 48 85 ff 74 57 <48> 8b 17 31 c0 48 85 d2 75 0e eb 2b 0f 1f 00 48 8b 52 10 48 85 d2
Jan 13 13:10:49 scout kernel: RSP: 0018:ffffb9e540a4bc20 EFLAGS: 00010202
Jan 13 13:10:49 scout kernel: RAX: 0000000000000020 RBX: 0000000000000020 RCX: 0000000000000010
Jan 13 13:10:49 scout kernel: RDX: ffff9b00f9bcde08 RSI: ffffffffffffffff RDI: 0000000000000020
Jan 13 13:10:49 scout kernel: RBP: ffff9b00dfa929f0 R08: ffffffffc2ac4b60 R09: ffff9b00dfa929d0
Jan 13 13:10:49 scout kernel: R10: ffff9b00cbd30008 R11: ffff9b00cbd31098 R12: 0000000000000020
Jan 13 13:10:49 scout kernel: R13: 0000000000000000 R14: ffff9b00dfa92b58 R15: ffff9b00dfa92c98
Jan 13 13:10:49 scout kernel: FS:  0000000000000000(0000) GS:ffff9b03ced00000(0000) knlGS:0000000000000000
Jan 13 13:10:49 scout kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jan 13 13:10:49 scout kernel: CR2: 0000000000000020 CR3: 00000001297f4001 CR4: 00000000001706e0
Jan 13 13:10:49 scout kernel: Call Trace:
Jan 13 13:10:49 scout kernel:  ? _nv030766rm+0x1b/0x90 [nvidia]
Jan 13 13:10:49 scout kernel:  ? _nv026432rm+0x18/0x60 [nvidia]
Jan 13 13:10:49 scout kernel:  ? _nv012979rm+0x13d/0x1c0 [nvidia]
Jan 13 13:10:49 scout kernel:  ? _nv000081rm+0x12f/0x1a0 [nvidia]
Jan 13 13:10:49 scout kernel:  ? _nv037801rm+0xc3/0x350 [nvidia]
Jan 13 13:10:49 scout kernel:  ? _nv037800rm+0x63/0x80 [nvidia]
Jan 13 13:10:49 scout kernel:  ? _nv012906rm+0x78/0xd0 [nvidia]
Jan 13 13:10:49 scout kernel:  ? _nv012906rm+0x1a/0xd0 [nvidia]
Jan 13 13:10:49 scout kernel:  ? _nv025575rm+0x251/0x3e0 [nvidia]
Jan 13 13:10:49 scout kernel:  ? _nv025524rm+0x1f/0xf0 [nvidia]
Jan 13 13:10:49 scout kernel:  ? _nv016719rm+0xd3/0x3c0 [nvidia]
Jan 13 13:10:49 scout kernel:  ? _nv028705rm+0xb23/0xdc0 [nvidia]
Jan 13 13:10:49 scout kernel:  ? _nv028713rm+0x15d/0x400 [nvidia]
Jan 13 13:10:49 scout kernel:  ? _nv000709rm+0xa9/0x240 [nvidia]
Jan 13 13:10:49 scout kernel:  ? disable_irq_nosync+0x10/0x10
Jan 13 13:10:49 scout kernel:  ? rm_isr_bh+0x1c/0x60 [nvidia]
Jan 13 13:10:49 scout kernel:  ? nvidia_isr_kthread_bh+0x1b/0x40 [nvidia]
Jan 13 13:10:49 scout kernel:  ? irq_thread_fn+0x20/0x60
Jan 13 13:10:49 scout kernel:  ? irq_thread+0xf5/0x1a0
Jan 13 13:10:49 scout kernel:  ? irq_finalize_oneshot.part.0+0xe0/0xe0
Jan 13 13:10:49 scout kernel:  ? irq_thread_check_affinity+0xd0/0xd0
Jan 13 13:10:49 scout kernel:  ? kthread+0x133/0x150
Jan 13 13:10:49 scout kernel:  ? __kthread_bind_mask+0x60/0x60
Jan 13 13:10:49 scout kernel:  ? ret_from_fork+0x22/0x30
Jan 13 13:10:49 scout kernel: Modules linked in: nvidia_drm(POE) nvidia_modeset(POE) nvidia(POE) nct6775 hwmon_vid snd_hda_codec_realtek snd_hda_codec_generic intel_rapl_msr intel_rapl_common ledtrig_audio snd_hda_codec_hdmi snd_hda_intel snd_intel_dspcfg soundwire_intel soundwire_generic_allocation x86_pkg_t>
Jan 13 13:10:49 scout kernel:  mousedev e1000e syscopyarea snd mei_me ecc sysfillrect lpc_ich soundcore mei sysimgblt fb_sys_fops wmi mac_hid video drm crypto_user fuse agpgart bpf_preload ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd2 crc32c_intel serio_raw usbhid xhci_pci xhci_pci_renesas
Jan 13 13:10:49 scout kernel: CR2: 0000000000000020
Jan 13 13:10:49 scout kernel: ---[ end trace aa3b68788dfd2c47 ]---
Jan 13 13:10:49 scout kernel: RIP: 0010:_nv028498rm+0x9/0x90 [nvidia]
Jan 13 13:10:49 scout kernel: Code: 8e ff e8 8a af 00 00 31 c0 48 83 c4 08 c3 31 c0 eb bf 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 83 ec 08 48 85 ff 74 57 <48> 8b 17 31 c0 48 85 d2 75 0e eb 2b 0f 1f 00 48 8b 52 10 48 85 d2
Jan 13 13:10:49 scout kernel: RSP: 0018:ffffb9e540a4bc20 EFLAGS: 00010202
Jan 13 13:10:49 scout kernel: RAX: 0000000000000020 RBX: 0000000000000020 RCX: 0000000000000010
Jan 13 13:10:49 scout kernel: RDX: ffff9b00f9bcde08 RSI: ffffffffffffffff RDI: 0000000000000020
Jan 13 13:10:49 scout kernel: RBP: ffff9b00dfa929f0 R08: ffffffffc2ac4b60 R09: ffff9b00dfa929d0
Jan 13 13:10:49 scout kernel: R10: ffff9b00cbd30008 R11: ffff9b00cbd31098 R12: 0000000000000020
Jan 13 13:10:49 scout kernel: R13: 0000000000000000 R14: ffff9b00dfa92b58 R15: ffff9b00dfa92c98
Jan 13 13:10:49 scout kernel: FS:  0000000000000000(0000) GS:ffff9b03ced00000(0000) knlGS:0000000000000000
Jan 13 13:10:49 scout kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jan 13 13:10:49 scout kernel: CR2: 0000000000000020 CR3: 00000001297f4001 CR4: 00000000001706e0
Jan 13 13:10:49 scout kernel: sched: RT throttling activated
Jan 13 13:10:49 scout kernel: BUG: kernel NULL pointer dereference, address: 0000000000000020
Jan 13 13:10:49 scout kernel: #PF: supervisor read access in kernel mode
Jan 13 13:10:49 scout kernel: #PF: error_code(0x0000) - not-present page
Jan 13 13:10:49 scout kernel: PGD 80000001330fb067 P4D 80000001330fb067 PUD 0 

I got some application call stacks (opengl drive interaction, I assume) this time. Not sure if the apps crashed because of the kernel module fault (not sure about order of events).

This one shows libnvidia-glcore stack

Jan 13 09:38:28 scout systemd-coredump[1177]: Process 1135 (teams) of user 1000 dumped core.
                                              
                                              Stack trace of thread 1175:
                                              #0  0x000055949f4a0112 n/a (teams + 0x48c5112)
                                              #1  0x000055949f4a3fa6 n/a (teams + 0x48c8fa6)
                                              #2  0x00007f1b051f00f0 __restore_rt (libpthread.so.0 + 0x140f0)
                                              #3  0x00007f1b03bafc51 clock_nanosleep@@GLIBC_2.17 (libc.so.6 + 0xc7c51)
                                              #4  0x00007f1b03bb5137 __nanosleep (libc.so.6 + 0xcd137)
                                              #5  0x00007f1b03be0419 usleep (libc.so.6 + 0xf8419)
                                              #6  0x00007f1afae698be n/a (libnvidia-glcore.so.460.32.03 + 0xd708be)
                                              #7  0x00007f1afaea6f32 n/a (libnvidia-glcore.so.460.32.03 + 0xdadf32)
                                              #8  0x00007f1afaeaa37d n/a (libnvidia-glcore.so.460.32.03 + 0xdb137d)
                                              #9  0x00007f1b0135cf1f __glDispatchCheckMultithreaded (libGLdispatch.so.0 + 0x41f1f)
                                              #10 0x00007f1b0130416c glXGetFBConfigs (libGLX.so.0 + 0x1c16c)
                                              #11 0x000055949ef3a6c8 n/a (teams + 0x435f6c8)
                                              #12 0x000055949ef3b134 n/a (teams + 0x4360134)
                                              #13 0x000055949e4fc7f8 n/a (teams + 0x39217f8)
                                              #14 0x000055949e5178c0 n/a (teams + 0x393c8c0)
                                              #15 0x000055949e517e06 n/a (teams + 0x393ce06)
                                              #16 0x000055949e518003 n/a (teams + 0x393d003)
                                              #17 0x000055949e51891a n/a (teams + 0x393d91a)
                                              #18 0x000055949e533dd5 n/a (teams + 0x3958dd5)
                                              #19 0x000055949e56789f n/a (teams + 0x398c89f)
                                              #20 0x000055949e5a7537 n/a (teams + 0x39cc537)
                                              #21 0x00007f1b051e53e9 start_thread (libpthread.so.0 + 0x93e9)
                                              #22 0x00007f1b03be8293 __clone (libc.so.6 + 0x100293)
                                              
                                              Stack trace of thread 1146:
                                              #0  0x00007f1b03be85de epoll_wait (libc.so.6 + 0x1005de)
                                              #1  0x000055949e5b411a n/a (teams + 0x39d911a)
                                              #2  0x000055949e5b1ab3 n/a (teams + 0x39d6ab3)
                                              #3  0x000055949e5a5010 n/a (teams + 0x39ca010)
                                              #4  0x000055949e533dd5 n/a (teams + 0x3958dd5)
                                              #5  0x000055949e56789f n/a (teams + 0x398c89f)
                                              #6  0x000055949e5a7537 n/a (teams + 0x39cc537)
                                              #7  0x00007f1b051e53e9 start_thread (libpthread.so.0 + 0x93e9)
                                              #8  0x00007f1b03be8293 __clone (libc.so.6 + 0x100293)
                                              
                                              Stack trace of thread 1142:
                                              #0  0x00007f1b051eb9c8 pthread_cond_timedwait@@GLIBC_2.3.2 (libpthread.so.0 + 0xf9c8)
                                              #1  0x000055949e59f082 n/a (teams + 0x39c4082)

This other ones:

Jul 12 19:50:53 scout systemd[10635]: pipewire.socket: Succeeded.
...skipping...
                                               #30 0x000055df44118af0 n/a (teams + 0x2f97af0)
                                               #31 0x000055df441188f8 _ZN2v88internal9Execution4CallEPNS0_7IsolateENS0_6HandleINS0_6ObjectEEES6_iPS6_ (teams + 0x2f978f8)
                                               #32 0x000055df444ee274 _ZN2v88Function4CallENS_5LocalINS_7ContextEEENS1_INS_5ValueEEEiPS5_ (teams + 0x336d274)
                                               #33 0x000055df478414e8 n/a (teams + 0x66c04e8)
                                               #34 0x000055df47841721 _ZN4node12MakeCallbackEPN2v87IsolateENS0_5LocalINS0_6ObjectEEENS3_INS0_8FunctionEEEiPNS3_INS0_5ValueEEENS_13async_contextE (teams + 0x66c0721)
                                               #35 0x000055df44a2577d n/a (teams + 0x38a477d)
                                               #36 0x000055df449a945b n/a (teams + 0x382845b)
                                               #37 0x000055df449c775c n/a (teams + 0x384675c)
                                               #38 0x000055df449bc1ba n/a (teams + 0x383b1ba)
                                               #39 0x000055df449b46e1 n/a (teams + 0x38336e1)
                                               #40 0x000055df449b458c n/a (teams + 0x383358c)
                                               #41 0x000055df449b42b5 n/a (teams + 0x38332b5)
                                               #42 0x000055df43b8e9d7 n/a (teams + 0x2a0d9d7)
                                               #43 0x000055df4393085d n/a (teams + 0x27af85d)
                                               #44 0x000055df4502e21b n/a (teams + 0x3ead21b)
                                               #45 0x000055df44aa27f8 n/a (teams + 0x39217f8)
                                               #46 0x000055df44abd8c0 n/a (teams + 0x393c8c0)
                                               #47 0x000055df44abde06 n/a (teams + 0x393ce06)
                                               #48 0x000055df44abe003 n/a (teams + 0x393d003)
                                               #49 0x000055df44ad332f n/a (teams + 0x395232f)
                                               #50 0x00007f3b3210ea84 g_main_context_dispatch (libglib-2.0.so.0 + 0x52a84)
                                               #51 0x00007f3b321629b1 n/a (libglib-2.0.so.0 + 0xa69b1)
                                               #52 0x00007f3b3210d2b1 g_main_context_iteration (libglib-2.0.so.0 + 0x512b1)
                                               #53 0x000055df44abec52 n/a (teams + 0x393dc52)
                                               #54 0x000055df44ad9dd5 n/a (teams + 0x3958dd5)
                                               #55 0x000055df43805d74 n/a (teams + 0x2684d74)
                                               #56 0x000055df43805b33 n/a (teams + 0x2684b33)
                                               #57 0x000055df43808312 n/a (teams + 0x2687312)
                                               #58 0x000055df43801cef n/a (teams + 0x2680cef)
                                               #59 0x000055df44943dec n/a (teams + 0x37c2dec)
                                               #60 0x000055df45c20135 n/a (teams + 0x4a9f135)
                                               #61 0x000055df44941eb1 n/a (teams + 0x37c0eb1)
                                               #62 0x000055df42f5f246 n/a (teams + 0x1dde246)
                                               #63 0x00007f3b30b87152 __libc_start_main (libc.so.6 + 0x28152)
Jan 13 11:06:08 scout systemd[1]: systemd-coredump@2-37850-0.service: Succeeded.

No bugreport.sh because this causes the system to hard crash, no only the hard reset button does anything at this point.

@andreesteve my only current workaround is to stick up to the LTS kernel and 440.100 nvidia driver. Not sure what caused your ā€˜teamsā€™ crash (segfault 7 or abrt 6 signal?). Nonetheless Im not a code geek, but itā€™s something related to the clock_nanosleep kernel call what most probably leads to the teams crash in conjuction to recent driver implementation. The only way is to keep this thread running and wait for a fix.

1 Like

Previously I was hit by page allocation failures, but those were resolved with the patch for 455 or in 460.
Now Iā€™m getting this error.
Arch Linux, kernels 5.9 and 5.10.
Lenovo Thinkstation a few years old,
GTX 1660 Super, drivers 455.45.01 and 460.32.03.

Jan 16 11:34:04 mm-station kernel: BUG: kernel NULL pointer dereference, address: 0000000000000020
Jan 16 11:34:04 mm-station kernel: #PF: supervisor read access in kernel mode
Jan 16 11:34:04 mm-station kernel: #PF: error_code(0x0000) - not-present page
Jan 16 11:34:04 mm-station kernel: PGD 80000002042c5067 P4D 80000002042c5067 PUD 52325c067 PMD 20a8cd067 PTE 0
Jan 16 11:34:04 mm-station kernel: Oops: 0000 [#1] PREEMPT SMP PTI
Jan 16 11:34:04 mm-station kernel: CPU: 1 PID: 1164 Comm: irq/35-nvidia Tainted: P           OE     5.10.6-arch1-1 #1
Jan 16 11:34:04 mm-station kernel: Hardware name: LENOVO 30A0XXXXXX/SHARKBAY, BIOS FBKTDEAUS 06/16/2020
Jan 16 11:34:04 mm-station kernel: RIP: 0010:_nv028498rm+0x9/0x90 [nvidia]
Jan 16 11:34:04 mm-station kernel: Code: 8e ff e8 8a af 00 00 31 c0 48 83 c4 08 c3 31 c0 eb bf 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 83 ec 08 48 85 ff 74 57 <48> 8b 17 31 c0 48 85 d2 75 0e eb 2b 0f 1f 00 48 8b 52 10 48 85 d2
Jan 16 11:34:04 mm-station kernel: RSP: 0018:ffff9f1dc079bb60 EFLAGS: 00010202
Jan 16 11:34:04 mm-station kernel: RAX: 0000000000000020 RBX: 0000000000000020 RCX: 0000000000000010
Jan 16 11:34:04 mm-station kernel: RDX: ffff8db2d52013c8 RSI: ffffffffffffffff RDI: 0000000000000020
Jan 16 11:34:04 mm-station kernel: RBP: ffff8db10c2428c0 R08: ffff8db10c242b30 R09: ffff8db10c2428a0
Jan 16 11:34:04 mm-station kernel: R10: ffff8db10c25c008 R11: ffff8db10c25d098 R12: 0000000000000020
Jan 16 11:34:04 mm-station kernel: R13: 0000000000000000 R14: ffff8db10c242a28 R15: ffff8db10c242b30
Jan 16 11:34:04 mm-station kernel: FS:  0000000000000000(0000) GS:ffff8db7fdc40000(0000) knlGS:0000000000000000
Jan 16 11:34:04 mm-station kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jan 16 11:34:04 mm-station kernel: CR2: 0000000000000020 CR3: 000000027a26a001 CR4: 00000000001706e0
Jan 16 11:34:04 mm-station kernel: Call Trace:
Jan 16 11:34:04 mm-station kernel:  ? _nv030766rm+0x1b/0x90 [nvidia]
Jan 16 11:34:04 mm-station kernel:  ? _nv026432rm+0x18/0x60 [nvidia]
Jan 16 11:34:04 mm-station kernel:  ? _nv012979rm+0x13d/0x1c0 [nvidia]
Jan 16 11:34:04 mm-station kernel:  ? _nv000081rm+0x12f/0x1a0 [nvidia]
Jan 16 11:34:04 mm-station kernel:  ? _nv012910rm+0xff/0x180 [nvidia]
Jan 16 11:34:04 mm-station kernel:  ? _nv019531rm+0x1af/0x210 [nvidia]
Jan 16 11:34:04 mm-station kernel:  ? _nv019482rm+0xdf3/0xef0 [nvidia]
Jan 16 11:34:04 mm-station kernel:  ? _nv019483rm+0xf3/0x290 [nvidia]
Jan 16 11:34:04 mm-station kernel:  ? _nv019484rm+0x12f/0x350 [nvidia]
Jan 16 11:34:04 mm-station kernel:  ? _nv019485rm+0x1f5/0x320 [nvidia]
Jan 16 11:34:04 mm-station kernel:  ? _nv019449rm+0x78/0xd0 [nvidia]
Jan 16 11:34:04 mm-station kernel:  ? _nv019463rm+0xcf/0x2f0 [nvidia]
Jan 16 11:34:04 mm-station kernel:  ? _nv019464rm+0x35/0x540 [nvidia]
Jan 16 11:34:04 mm-station kernel:  ? _nv019497rm+0xbe/0xe0 [nvidia]
Jan 16 11:34:04 mm-station kernel:  ? _nv028705rm+0x97b/0xdc0 [nvidia]
Jan 16 11:34:04 mm-station kernel:  ? _nv028713rm+0x15d/0x400 [nvidia]
Jan 16 11:34:04 mm-station kernel:  ? _nv000709rm+0xa9/0x240 [nvidia]
Jan 16 11:34:04 mm-station kernel:  ? disable_irq_nosync+0x10/0x10
Jan 16 11:34:04 mm-station kernel:  ? rm_isr_bh+0x1c/0x60 [nvidia]
Jan 16 11:34:04 mm-station kernel:  ? nvidia_isr_kthread_bh+0x1b/0x40 [nvidia]
Jan 16 11:34:04 mm-station kernel:  ? irq_thread_fn+0x20/0x60
Jan 16 11:34:04 mm-station kernel:  ? irq_thread+0xf5/0x1a0
Jan 16 11:34:04 mm-station kernel:  ? irq_finalize_oneshot.part.0+0xe0/0xe0
Jan 16 11:34:04 mm-station kernel:  ? irq_thread_check_affinity+0xd0/0xd0
Jan 16 11:34:04 mm-station kernel:  ? kthread+0x133/0x150
Jan 16 11:34:04 mm-station kernel:  ? __kthread_bind_mask+0x60/0x60
Jan 16 11:34:04 mm-station kernel:  ? ret_from_fork+0x22/0x30
Jan 16 11:34:04 mm-station kernel: Modules linked in: nvidia_uvm(POE) mei_hdcp mei_wdt 8021q garp mrp stp llc ccm rt2800usb rt2x00usb rt2800lib rt2x00lib mac80211 cfg80211 rfkill libarc4 mousedev nvidia_drm(POE) nvidia_modeset(POE) nvidia(POE) snd_hda_codec_realtek intel_rapl_msr intel_rapl_common snd_hda_codec_generic snd_hda_codec_hdmi ledtrig_audio snd_hda_intel x86_pkg_temp_thermal intel_powerclamp snd_intel_dspcfg soundwire_intel coretemp xxhash_generic soundwire_generic_allocation kvm_intel soundwire_cadence snd_hda_codec ucsi_ccg iTCO_wdt snd_hda_core typec_ucsi intel_pmc_bxt typec kvm at24 wmi_bmof iTCO_vendor_support snd_hwdep soundwire_bus irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel btrfs drm_kms_helper snd_soc_core aesni_intel crypto_simd snd_compress cryptd ac97_bus cec snd_pcm_dmaengine glue_helper snd_pcm rapl tpm_tis blake2b_generic intel_cstate xor syscopyarea tpm_tis_core mei_me raid6_pq snd_timer sysfillrect intel_uncore pcspkr snd i2c_i801 sysimgblt libcrc32c e1000e mei
Jan 16 11:34:04 mm-station kernel:  soundcore fb_sys_fops tpm i2c_nvidia_gpu i2c_smbus wmi lpc_ich rng_core mac_hid video vboxnetflt(OE) vboxnetadp(OE) drm vboxdrv(OE) fuse agpgart bpf_preload ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd2 usbhid uas usb_storage crc32c_intel sr_mod cdrom xhci_pci xhci_pci_renesas
Jan 16 11:34:04 mm-station kernel: CR2: 0000000000000020
Jan 16 11:34:04 mm-station kernel: ---[ end trace 3438ebc2238aedc5 ]---
Jan 16 11:34:04 mm-station kernel: RIP: 0010:_nv028498rm+0x9/0x90 [nvidia]
Jan 16 11:34:04 mm-station kernel: Code: 8e ff e8 8a af 00 00 31 c0 48 83 c4 08 c3 31 c0 eb bf 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 83 ec 08 48 85 ff 74 57 <48> 8b 17 31 c0 48 85 d2 75 0e eb 2b 0f 1f 00 48 8b 52 10 48 85 d2
Jan 16 11:34:04 mm-station kernel: RSP: 0018:ffff9f1dc079bb60 EFLAGS: 00010202
Jan 16 11:34:04 mm-station kernel: RAX: 0000000000000020 RBX: 0000000000000020 RCX: 0000000000000010
Jan 16 11:34:04 mm-station kernel: RDX: ffff8db2d52013c8 RSI: ffffffffffffffff RDI: 0000000000000020
Jan 16 11:34:04 mm-station kernel: RBP: ffff8db10c2428c0 R08: ffff8db10c242b30 R09: ffff8db10c2428a0
Jan 16 11:34:04 mm-station kernel: R10: ffff8db10c25c008 R11: ffff8db10c25d098 R12: 0000000000000020
Jan 16 11:34:04 mm-station kernel: R13: 0000000000000000 R14: ffff8db10c242a28 R15: ffff8db10c242b30
Jan 16 11:34:04 mm-station kernel: FS:  0000000000000000(0000) GS:ffff8db7fdc40000(0000) knlGS:0000000000000000
Jan 16 11:34:04 mm-station kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jan 16 11:34:04 mm-station kernel: CR2: 0000000000000020 CR3: 000000027a26a001 CR4: 00000000001706e0
Jan 16 11:34:04 mm-station kernel: BUG: kernel NULL pointer dereference, address: 0000000000000959
Jan 16 11:34:04 mm-station kernel: #PF: supervisor write access in kernel mode
Jan 16 11:34:04 mm-station kernel: #PF: error_code(0x0002) - not-present page
Jan 16 11:34:04 mm-station kernel: PGD 80000002042c5067 P4D 80000002042c5067 PUD 52325c067 PMD 20a8cd067 PTE 0
Jan 16 11:34:04 mm-station kernel: Oops: 0002 [#2] PREEMPT SMP PTI
Jan 16 11:34:04 mm-station kernel: CPU: 1 PID: 1164 Comm: irq/35-nvidia Tainted: P      D    OE     5.10.6-arch1-1 #1
Jan 16 11:34:04 mm-station kernel: Hardware name: LENOVO 30A0XXXXXX/SHARKBAY, BIOS FBKTDEAUS 06/16/2020
Jan 16 11:34:04 mm-station kernel: RIP: 0010:mutex_lock+0x10/0x20
Jan 16 11:34:04 mm-station kernel: Code: 03 31 c0 c3 eb d4 0f 1f 40 00 0f 1f 44 00 00 be 02 00 00 00 e9 a1 fa ff ff 90 0f 1f 44 00 00 31 c0 65 48 8b 14 25 c0 7b 01 00 <f0> 48 0f b1 17 75 01 c3 eb d6 66 0f 1f 44 00 00 0f 1f 44 00 00 41
Jan 16 11:34:04 mm-station kernel: RSP: 0018:ffff9f1dc079be30 EFLAGS: 00010246
Jan 16 11:34:04 mm-station kernel: RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000000
Jan 16 11:34:04 mm-station kernel: RDX: ffff8db1033bdc40 RSI: 0000000000001b41 RDI: 0000000000000959
Jan 16 11:34:04 mm-station kernel: RBP: 0000000000000959 R08: 0000000000000000 R09: ffff9f1dc079b7c0
Jan 16 11:34:04 mm-station kernel: R10: ffff9f1dc079b7b8 R11: ffffffff91ecb228 R12: ffff8db1033be434
Jan 16 11:34:04 mm-station kernel: R13: 0000000000000001 R14: 0000000000000001 R15: ffff8db1033bdc40
Jan 16 11:34:04 mm-station kernel: FS:  0000000000000000(0000) GS:ffff8db7fdc40000(0000) knlGS:0000000000000000
Jan 16 11:34:04 mm-station kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jan 16 11:34:04 mm-station kernel: CR2: 0000000000000959 CR3: 000000027a26a001 CR4: 00000000001706e0
Jan 16 11:34:04 mm-station kernel: Call Trace:
Jan 16 11:34:04 mm-station kernel:  perf_event_exit_task+0x30/0x440
Jan 16 11:34:04 mm-station kernel:  do_exit+0x355/0xa40
Jan 16 11:34:04 mm-station kernel:  ? task_work_run+0x5c/0x90
Jan 16 11:34:04 mm-station kernel:  ? do_exit+0x345/0xa40
Jan 16 11:34:04 mm-station kernel:  ? kthread+0x133/0x150
Jan 16 11:34:04 mm-station kernel:  ? rewind_stack_do_exit+0x17/0x17
Jan 16 11:34:04 mm-station kernel: Modules linked in: nvidia_uvm(POE) mei_hdcp mei_wdt 8021q garp mrp stp llc ccm rt2800usb rt2x00usb rt2800lib rt2x00lib mac80211 cfg80211 rfkill libarc4 mousedev nvidia_drm(POE) nvidia_modeset(POE) nvidia(POE) snd_hda_codec_realtek intel_rapl_msr intel_rapl_common snd_hda_codec_generic snd_hda_codec_hdmi ledtrig_audio snd_hda_intel x86_pkg_temp_thermal intel_powerclamp snd_intel_dspcfg soundwire_intel coretemp xxhash_generic soundwire_generic_allocation kvm_intel soundwire_cadence snd_hda_codec ucsi_ccg iTCO_wdt snd_hda_core typec_ucsi intel_pmc_bxt typec kvm at24 wmi_bmof iTCO_vendor_support snd_hwdep soundwire_bus irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel btrfs drm_kms_helper snd_soc_core aesni_intel crypto_simd snd_compress cryptd ac97_bus cec snd_pcm_dmaengine glue_helper snd_pcm rapl tpm_tis blake2b_generic intel_cstate xor syscopyarea tpm_tis_core mei_me raid6_pq snd_timer sysfillrect intel_uncore pcspkr snd i2c_i801 sysimgblt libcrc32c e1000e mei
Jan 16 11:34:04 mm-station kernel:  soundcore fb_sys_fops tpm i2c_nvidia_gpu i2c_smbus wmi lpc_ich rng_core mac_hid video vboxnetflt(OE) vboxnetadp(OE) drm vboxdrv(OE) fuse agpgart bpf_preload ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd2 usbhid uas usb_storage crc32c_intel sr_mod cdrom xhci_pci xhci_pci_renesas
Jan 16 11:34:04 mm-station kernel: CR2: 0000000000000959
Jan 16 11:34:04 mm-station kernel: ---[ end trace 3438ebc2238aedc6 ]---
Jan 16 11:34:04 mm-station kernel: RIP: 0010:_nv028498rm+0x9/0x90 [nvidia]
Jan 16 11:34:04 mm-station kernel: Code: 8e ff e8 8a af 00 00 31 c0 48 83 c4 08 c3 31 c0 eb bf 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 83 ec 08 48 85 ff 74 57 <48> 8b 17 31 c0 48 85 d2 75 0e eb 2b 0f 1f 00 48 8b 52 10 48 85 d2
Jan 16 11:34:04 mm-station kernel: RSP: 0018:ffff9f1dc079bb60 EFLAGS: 00010202
Jan 16 11:34:04 mm-station kernel: RAX: 0000000000000020 RBX: 0000000000000020 RCX: 0000000000000010
Jan 16 11:34:04 mm-station kernel: RDX: ffff8db2d52013c8 RSI: ffffffffffffffff RDI: 0000000000000020
Jan 16 11:34:04 mm-station kernel: RBP: ffff8db10c2428c0 R08: ffff8db10c242b30 R09: ffff8db10c2428a0
Jan 16 11:34:04 mm-station kernel: R10: ffff8db10c25c008 R11: ffff8db10c25d098 R12: 0000000000000020
Jan 16 11:34:04 mm-station kernel: R13: 0000000000000000 R14: ffff8db10c242a28 R15: ffff8db10c242b30
Jan 16 11:34:04 mm-station kernel: FS:  0000000000000000(0000) GS:ffff8db7fdc40000(0000) knlGS:0000000000000000
Jan 16 11:34:04 mm-station kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jan 16 11:34:04 mm-station kernel: CR2: 0000000000000959 CR3: 000000027a26a001 CR4: 00000000001706e0
Jan 16 11:34:04 mm-station kernel: Fixing recursive fault but reboot is needed!

I was tired of having nvidia-bug-report.sh hang on me even when running with --safe-mode, so I ran the script with strace to maybe find out why it hangs. And judging by the (incomplete) strace log file, the script will hang while trying to read /proc/driver/nvidia/./gpus/0000:01:00.0/power:

[pid  2028] openat(AT_FDCWD, "/proc/driver/nvidia/./gpus/0000:01:00.0/power", O_RDONLY) = 3
[pid  2028] fstat(3, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0
[pid  2028] fadvise64(3, 0, 0, POSIX_FADV_SEQUENTIAL) = 0
[pid  2028] mmap(NULL, 139264, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f57d0e64000
[pid  2028] read(3, 

Hereā€™s the command I used to capture the strace log (captured via SSH, because everything freezes and Iā€™m unable to even switch to a TTY):

$ sudo strace -ff nvidia-bug-report.sh --safe-mode --extra-system-data 2>&1 | tee -a strace.log

And hereā€™s the strace log itself: strace.log (614.8 KB)

And again, the driver crash happened while I was using Chromium, more specifically, watching a random Facebook video. This seems like the most random bug too, because I had literally just rebooted my computer, then I opened Chromium, watched the video for a minute and it crashed. So I tried to manually reproduce the crash again, repeating step-by-step, but I wasnā€™t able to!!!

The crash:

jan 17 05:20:08 arch kernel: BUG: kernel NULL pointer dereference, address: 0000000000000020
jan 17 05:20:08 arch kernel: #PF: supervisor read access in kernel mode
jan 17 05:20:08 arch kernel: #PF: error_code(0x0000) - not-present page
jan 17 05:20:08 arch kernel: PGD 800000012c756067 P4D 800000012c756067 PUD 0 
jan 17 05:20:08 arch kernel: Oops: 0000 [#1] PREEMPT SMP PTI
jan 17 05:20:08 arch kernel: CPU: 2 PID: 215 Comm: irq/29-nvidia Tainted: P           OE     5.10.7-arch1-1 #1
jan 17 05:20:08 arch kernel: Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./B75M-DGS R2.0, BIOS P1.50 03/14/2018
jan 17 05:20:08 arch kernel: RIP: 0010:_nv028498rm+0x9/0x90 [nvidia]
jan 17 05:20:08 arch kernel: Code: 8e ff e8 8a af 00 00 31 c0 48 83 c4 08 c3 31 c0 eb bf 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 83 ec 08 48 85 ff 74 57 <48> 8b 17 31 c0 48 85 d2 75>
jan 17 05:20:08 arch kernel: RSP: 0018:ffff9fddc359bc20 EFLAGS: 00010202
jan 17 05:20:08 arch kernel: RAX: 0000000000000020 RBX: 0000000000000020 RCX: 0000000000000010
jan 17 05:20:08 arch kernel: RDX: ffff89f868588908 RSI: ffffffffffffffff RDI: 0000000000000020
jan 17 05:20:08 arch kernel: RBP: ffff89f8129f5990 R08: ffffffffc2152b60 R09: ffff89f8129f5970
jan 17 05:20:08 arch kernel: R10: ffff89f812974008 R11: ffff89f812975098 R12: 0000000000000020
jan 17 05:20:08 arch kernel: R13: 0000000000000000 R14: ffff89f8129f5af8 R15: ffff89f8129f5c00
jan 17 05:20:08 arch kernel: FS:  0000000000000000(0000) GS:ffff89f915d00000(0000) knlGS:0000000000000000
jan 17 05:20:08 arch kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
jan 17 05:20:08 arch kernel: CR2: 0000000000000020 CR3: 000000012b102004 CR4: 00000000001706e0
jan 17 05:20:08 arch kernel: Call Trace:
jan 17 05:20:08 arch kernel:  ? _nv030766rm+0x1b/0x90 [nvidia]
jan 17 05:20:08 arch kernel:  ? _nv026432rm+0x18/0x60 [nvidia]
jan 17 05:20:08 arch kernel:  ? _nv012979rm+0x13d/0x1c0 [nvidia]
jan 17 05:20:08 arch kernel:  ? _nv000081rm+0x12f/0x1a0 [nvidia]
jan 17 05:20:08 arch kernel:  ? _nv012910rm+0xff/0x180 [nvidia]
jan 17 05:20:08 arch kernel:  ? _nv019531rm+0x1af/0x210 [nvidia]
jan 17 05:20:08 arch kernel:  ? _nv019482rm+0xdf3/0xef0 [nvidia]
jan 17 05:20:08 arch kernel:  ? _nv019449rm+0x78/0xd0 [nvidia]
jan 17 05:20:08 arch kernel:  ? _nv019463rm+0xcf/0x2f0 [nvidia]
jan 17 05:20:08 arch kernel:  ? _nv019497rm+0xbe/0xe0 [nvidia]
jan 17 05:20:08 arch kernel:  ? _nv028705rm+0x97b/0xdc0 [nvidia]
jan 17 05:20:08 arch kernel:  ? _nv028713rm+0x15d/0x400 [nvidia]
jan 17 05:20:08 arch kernel:  ? _nv000709rm+0xa9/0x240 [nvidia]
jan 17 05:20:08 arch kernel:  ? disable_irq_nosync+0x10/0x10
jan 17 05:20:08 arch kernel:  ? rm_isr_bh+0x1c/0x60 [nvidia]
jan 17 05:20:08 arch kernel:  ? nvidia_isr_kthread_bh+0x1b/0x40 [nvidia]
jan 17 05:20:08 arch kernel:  ? irq_thread_fn+0x20/0x60
jan 17 05:20:08 arch kernel:  ? irq_thread+0xf5/0x1a0
jan 17 05:20:08 arch kernel:  ? irq_finalize_oneshot.part.0+0xe0/0xe0
jan 17 05:20:08 arch kernel:  ? irq_thread_check_affinity+0xd0/0xd0
jan 17 05:20:08 arch kernel:  ? kthread+0x133/0x150
jan 17 05:20:08 arch kernel:  ? __kthread_bind_mask+0x60/0x60
jan 17 05:20:08 arch kernel:  ? ret_from_fork+0x22/0x30
jan 17 05:20:08 arch kernel: Modules linked in: rfcomm snd_seq_dummy snd_hrtimer snd_seq snd_seq_device cmac algif_hash algif_skcipher af_alg bnep intel_rapl_msr intel_rapl_common snd_hda_c>
jan 17 05:20:08 arch kernel:  xt_tcpudp xt_conntrack ip6table_filter ip6_tables nf_conntrack_netbios_ns nf_conntrack_broadcast nf_nat_ftp nf_nat nf_conntrack_ftp nf_conntrack nf_defrag_ipv6>
jan 17 05:20:08 arch kernel: CR2: 0000000000000020
jan 17 05:20:08 arch kernel: ---[ end trace 2771d77a04395ec1 ]---
jan 17 05:20:08 arch kernel: RIP: 0010:_nv028498rm+0x9/0x90 [nvidia]
jan 17 05:20:08 arch kernel: Code: 8e ff e8 8a af 00 00 31 c0 48 83 c4 08 c3 31 c0 eb bf 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 83 ec 08 48 85 ff 74 57 <48> 8b 17 31 c0 48 85 d2 75>
jan 17 05:20:08 arch kernel: RSP: 0018:ffff9fddc359bc20 EFLAGS: 00010202
jan 17 05:20:08 arch kernel: RAX: 0000000000000020 RBX: 0000000000000020 RCX: 0000000000000010
jan 17 05:20:08 arch kernel: RDX: ffff89f868588908 RSI: ffffffffffffffff RDI: 0000000000000020
jan 17 05:20:08 arch kernel: RBP: ffff89f8129f5990 R08: ffffffffc2152b60 R09: ffff89f8129f5970
jan 17 05:20:08 arch kernel: R10: ffff89f812974008 R11: ffff89f812975098 R12: 0000000000000020
jan 17 05:20:08 arch kernel: R13: 0000000000000000 R14: ffff89f8129f5af8 R15: ffff89f8129f5c00
jan 17 05:20:08 arch kernel: FS:  0000000000000000(0000) GS:ffff89f915d00000(0000) knlGS:0000000000000000
jan 17 05:20:08 arch kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
jan 17 05:20:08 arch kernel: CR2: 0000000000000020 CR3: 000000012b102004 CR4: 00000000001706e0
jan 17 05:20:08 arch kernel: BUG: kernel NULL pointer dereference, address: 0000000000000959
jan 17 05:20:08 arch kernel: #PF: supervisor write access in kernel mode
jan 17 05:20:08 arch kernel: #PF: error_code(0x0002) - not-present page
jan 17 05:20:08 arch kernel: PGD 800000012c756067 P4D 800000012c756067 PUD 0 
jan 17 05:20:08 arch kernel: Oops: 0002 [#2] PREEMPT SMP PTI
jan 17 05:20:08 arch kernel: CPU: 2 PID: 215 Comm: irq/29-nvidia Tainted: P      D    OE     5.10.7-arch1-1 #1
jan 17 05:20:08 arch kernel: Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./B75M-DGS R2.0, BIOS P1.50 03/14/2018
jan 17 05:20:08 arch kernel: RIP: 0010:mutex_lock+0x10/0x20
jan 17 05:20:08 arch kernel: Code: 03 31 c0 c3 eb d4 0f 1f 40 00 0f 1f 44 00 00 be 02 00 00 00 e9 a1 fa ff ff 90 0f 1f 44 00 00 31 c0 65 48 8b 14 25 c0 7b 01 00 <f0> 48 0f b1 17 75 01 c3 eb>
jan 17 05:20:08 arch kernel: RSP: 0018:ffff9fddc359be30 EFLAGS: 00010246
jan 17 05:20:08 arch kernel: RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000000
jan 17 05:20:08 arch kernel: RDX: ffff89f812e59ec0 RSI: 0000000000001b41 RDI: 0000000000000959
jan 17 05:20:08 arch kernel: RBP: 0000000000000959 R08: 0000000000000001 R09: 0000000000000000
jan 17 05:20:08 arch kernel: R10: ffff89f812a73c00 R11: 0000000000000000 R12: ffff89f812e5a6b4
jan 17 05:20:08 arch kernel: R13: 0000000000000001 R14: 0000000000000001 R15: ffff89f812e59ec0
jan 17 05:20:08 arch kernel: FS:  0000000000000000(0000) GS:ffff89f915d00000(0000) knlGS:0000000000000000
jan 17 05:20:08 arch kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
jan 17 05:20:08 arch kernel: CR2: 0000000000000959 CR3: 000000012b102004 CR4: 00000000001706e0
jan 17 05:20:08 arch kernel: Call Trace:
jan 17 05:20:08 arch kernel:  perf_event_exit_task+0x30/0x440
jan 17 05:20:08 arch kernel:  ? kfree+0x40c/0x440
jan 17 05:20:08 arch kernel:  do_exit+0x355/0xa40
jan 17 05:20:08 arch kernel:  ? task_work_run+0x5c/0x90
jan 17 05:20:08 arch kernel:  ? do_exit+0x345/0xa40
jan 17 05:20:08 arch kernel:  ? kthread+0x133/0x150
jan 17 05:20:08 arch kernel:  ? rewind_stack_do_exit+0x17/0x17
jan 17 05:20:08 arch kernel: Modules linked in: rfcomm snd_seq_dummy snd_hrtimer snd_seq snd_seq_device cmac algif_hash algif_skcipher af_alg bnep intel_rapl_msr intel_rapl_common snd_hda_c>
jan 17 05:20:08 arch kernel:  xt_tcpudp xt_conntrack ip6table_filter ip6_tables nf_conntrack_netbios_ns nf_conntrack_broadcast nf_nat_ftp nf_nat nf_conntrack_ftp nf_conntrack nf_defrag_ipv6>
jan 17 05:20:08 arch kernel: CR2: 0000000000000959
jan 17 05:20:08 arch kernel: ---[ end trace 2771d77a04395ec2 ]---
jan 17 05:20:08 arch kernel: RIP: 0010:_nv028498rm+0x9/0x90 [nvidia]
jan 17 05:20:08 arch kernel: Code: 8e ff e8 8a af 00 00 31 c0 48 83 c4 08 c3 31 c0 eb bf 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 83 ec 08 48 85 ff 74 57 <48> 8b 17 31 c0 48 85 d2 75>
jan 17 05:20:08 arch kernel: RSP: 0018:ffff9fddc359bc20 EFLAGS: 00010202
jan 17 05:20:08 arch kernel: RAX: 0000000000000020 RBX: 0000000000000020 RCX: 0000000000000010
jan 17 05:20:08 arch kernel: RDX: ffff89f868588908 RSI: ffffffffffffffff RDI: 0000000000000020
jan 17 05:20:08 arch kernel: RBP: ffff89f8129f5990 R08: ffffffffc2152b60 R09: ffff89f8129f5970
jan 17 05:20:08 arch kernel: R10: ffff89f812974008 R11: ffff89f812975098 R12: 0000000000000020
jan 17 05:20:08 arch kernel: R13: 0000000000000000 R14: ffff89f8129f5af8 R15: ffff89f8129f5c00
jan 17 05:20:08 arch kernel: FS:  0000000000000000(0000) GS:ffff89f915d00000(0000) knlGS:0000000000000000
jan 17 05:20:08 arch kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
jan 17 05:20:08 arch kernel: CR2: 0000000000000959 CR3: 000000012b102004 CR4: 00000000001706e0
jan 17 05:20:08 arch kernel: Fixing recursive fault but reboot is needed!

My System Information

OS: Arch Linux
Kernel: Linux arch 5.10.7-arch1-1 #1 SMP PREEMPT Wed, 13 Jan 2021 12:02:01 +0000 x86_64 GNU/Linux
Kernel boot flags:

quiet splash loglevel=3 rd.systemd.show_status=auto rd.udev.log_priority=3 intel_pstate=passive nvidia-drm.modeset=1

GPU: NVIDIA GTX 660
Chromium: 87.0.4280.141
Desktop Environment: GNOME 3.38.3 (X11)
Window Manager: mutter 3.38.3
/etc/modprobe.d/nvidia.conf:

options nvidia NVreg_UsePageAttributeTable=1

MODULES in /etc/mkinitcpio.conf:

MODULES=(nvidia nvidia_modeset nvidia_uvm nvidia_drm)

~/.config/chromium-flags.conf: chromium-flags.conf (2.3 KB)

1 Like

Any progress on that? The problem became much more than simple annoyance now, when I have to use Zoom while running long simulation jobs.

1 Like

Hi all, I came across this thread after searching the web for a solution to my crashes. Iā€™m not sure if this is the same bug as others are reporting but hereā€™s my crash:

Driver Version: 460.32.03 CUDA Version: 11.2
Linux master 5.10.9-arch1-1 #1 SMP PREEMPT Tue, 19 Jan 2021 22:06:06 +0000 x86_64 GNU/Linux
Linux 5.10.9-arch1-1 x86_64 GNU/Linux

[186073.351990] BUG: kernel NULL pointer dereference, address: 0000000000000020
[186073.351996] #PF: supervisor read access in kernel mode
[186073.351998] #PF: error_code(0x0000) - not-present page
[186073.352000] PGD 5200dc067 P4D 5200dc067 PUD 0
[186073.352007] Oops: 0000 [#1] PREEMPT SMP NOPTI
[186073.352011] CPU: 0 PID: 179 Comm: irq/30-nvidia Tainted: P           OE     5.10.8-arch1-1 #1
[186073.352012] Hardware name: ASUS All Series/Z87-PRO, BIOS 2103 08/18/2014
[186073.352308] RIP: 0010:_nv028498rm+0x9/0x90 [nvidia]
[186073.352312] Code: 8e ff e8 8a af 00 00 31 c0 48 83 c4 08 c3 31 c0 eb bf 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 83 ec 08 48 85 ff 74 57 <48> 8b 17 31 c0 48 85 d2 75 0e eb 2b 0f 1f 00 48 8b 52 10 48 85 d2
[186073.352314] RSP: 0018:ffffa4e9805b3bf0 EFLAGS: 00010202
[186073.352317] RAX: 0000000000000020 RBX: 0000000000000020 RCX: 0000000000000010
[186073.352319] RDX: ffff8fde16083988 RSI: ffffffffffffffff RDI: 0000000000000020
[186073.352320] RBP: ffff8fdd4e21d960 R08: ffffffffc1d6ab60 R09: ffff8fdd4e21d940
[186073.352322] R10: ffff8fdd4def4008 R11: ffff8fdd4def5098 R12: 0000000000000020
[186073.352324] R13: 0000000000000000 R14: ffff8fdd4e21dac8 R15: ffff8fdd4e21dbd0
[186073.352326] FS:  0000000000000000(0000) GS:ffff8fe45fa00000(0000) knlGS:0000000000000000
[186073.352328] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[186073.352330] CR2: 0000000000000020 CR3: 0000000105e38003 CR4: 00000000001706f0
[186073.352332] Call Trace:
[186073.352608]  ? _nv030766rm+0x1b/0x90 [nvidia]
[186073.352887]  ? _nv026432rm+0x18/0x60 [nvidia]
[186073.353165]  ? _nv012979rm+0x13d/0x1c0 [nvidia]
[186073.353384]  ? _nv000081rm+0x12f/0x1a0 [nvidia]
[186073.353747]  ? _nv012910rm+0xff/0x180 [nvidia]
[186073.354078]  ? _nv019531rm+0x1af/0x210 [nvidia]
[186073.354414]  ? _nv019482rm+0xdf3/0xef0 [nvidia]
[186073.354750]  ? _nv019483rm+0xf3/0x290 [nvidia]
[186073.355085]  ? _nv019449rm+0x78/0xd0 [nvidia]
[186073.355418]  ? _nv019463rm+0xcf/0x2f0 [nvidia]
[186073.355752]  ? _nv019497rm+0xbe/0xe0 [nvidia]
[186073.356113]  ? _nv028705rm+0x97b/0xdc0 [nvidia]
[186073.356475]  ? _nv028713rm+0x15d/0x400 [nvidia]
[186073.356702]  ? _nv000709rm+0xa9/0x240 [nvidia]
[186073.356709]  ? disable_irq_nosync+0x10/0x10
[186073.356936]  ? rm_isr_bh+0x1c/0x60 [nvidia]
[186073.357070]  ? nvidia_isr_kthread_bh+0x1b/0x40 [nvidia]
[186073.357073]  ? irq_thread_fn+0x20/0x60
[186073.357076]  ? irq_thread+0xf5/0x1a0
[186073.357079]  ? irq_finalize_oneshot.part.0+0xe0/0xe0
[186073.357083]  ? irq_thread_check_affinity+0xd0/0xd0
[186073.357087]  ? kthread+0x133/0x150
[186073.357090]  ? __kthread_bind_mask+0x60/0x60
[186073.357095]  ? ret_from_fork+0x1f/0x30
[186073.357098] Modules linked in: ufs hfsplus hfs cdrom minix vfat msdos fat jfs xfs uas usb_storage btrfs blake2b_generic xor raid6_pq overlay bnep nct6775 hwmon_vid intel_rapl_msr intel_rapl_common snd_hda_codec_hdmi ath9k snd_hda_intel ath9k_common snd_intel_dspcfg soundwire_intel ath9k_hw soundwire_generic_allocation soundwire_cadence ath x86_pkg_temp_thermal intel_powerclamp snd_hda_codec coretemp snd_usb_audio kvm_intel mac80211 snd_hda_core soundwire_bus ath3k kvm snd_usbmidi_lib btusb snd_hwdep btrtl btbcm snd_soc_core iTCO_wdt eeepc_wmi intel_pmc_bxt at24 asus_wmi irqbypass iTCO_vendor_support wmi_bmof snd_rawmidi mei_hdcp sparse_keymap mxm_wmi snd_compress rapl snd_seq_device intel_cstate btintel mc i915 ac97_bus cfg80211 bluetooth snd_pcm_dmaengine snd_pcm snd_timer i2c_i801 ecdh_generic ecc mousedev intel_uncore snd i2c_smbus mei_me e1000e rfkill mei i2c_algo_bit libarc4 lpc_ich soundcore intel_gtt video wmi nf_log_ipv4 nf_log_common mac_hid ipt_REJECT nf_reject_ipv4 xt_LOG
[186073.357178]  xt_limit xt_addrtype xt_tcpudp xt_conntrack ip6_tables nf_conntrack_netbios_ns nf_conntrack_broadcast nf_nat_ftp nf_nat nf_conntrack_ftp nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c iptable_filter vboxnetflt(OE) vboxnetadp(OE) vboxdrv(OE) sg crypto_user fuse acpi_call(OE) bpf_preload ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd2 hid_logitech_hidpp hid_logitech_dj usbhid dm_crypt cbc encrypted_keys dm_mod trusted tpm rng_core crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel crypto_simd cryptd glue_helper xhci_pci xhci_pci_renesas nvidia_drm(POE) drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops cec drm agpgart nvidia_uvm(POE) nvidia_modeset(POE) nvidia(POE)
[186073.357232] CR2: 0000000000000020
[186073.357236] ---[ end trace 6b118e390fe42176 ]---
[186073.357509] RIP: 0010:_nv028498rm+0x9/0x90 [nvidia]
[186073.357512] Code: 8e ff e8 8a af 00 00 31 c0 48 83 c4 08 c3 31 c0 eb bf 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 83 ec 08 48 85 ff 74 57 <48> 8b 17 31 c0 48 85 d2 75 0e eb 2b 0f 1f 00 48 8b 52 10 48 85 d2
[186073.357514] RSP: 0018:ffffa4e9805b3bf0 EFLAGS: 00010202
[186073.357516] RAX: 0000000000000020 RBX: 0000000000000020 RCX: 0000000000000010
[186073.357518] RDX: ffff8fde16083988 RSI: ffffffffffffffff RDI: 0000000000000020
[186073.357520] RBP: ffff8fdd4e21d960 R08: ffffffffc1d6ab60 R09: ffff8fdd4e21d940
[186073.357521] R10: ffff8fdd4def4008 R11: ffff8fdd4def5098 R12: 0000000000000020
[186073.357523] R13: 0000000000000000 R14: ffff8fdd4e21dac8 R15: ffff8fdd4e21dbd0
[186073.357525] FS:  0000000000000000(0000) GS:ffff8fe45fa00000(0000) knlGS:0000000000000000
[186073.357527] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[186073.357529] CR2: 0000000000000020 CR3: 0000000105e38003 CR4: 00000000001706f0
[186073.357610] BUG: kernel NULL pointer dereference, address: 0000000000000959
[186073.357612] #PF: supervisor write access in kernel mode
[186073.357614] #PF: error_code(0x0002) - not-present page
[186073.357615] PGD 5200dc067 P4D 5200dc067 PUD 0
[186073.357620] Oops: 0002 [#2] PREEMPT SMP NOPTI
[186073.357623] CPU: 0 PID: 179 Comm: irq/30-nvidia Tainted: P      D    OE     5.10.8-arch1-1 #1
[186073.357625] Hardware name: ASUS All Series/Z87-PRO, BIOS 2103 08/18/2014
[186073.357665] RIP: 0010:mutex_lock+0x10/0x20
[186073.357668] Code: 03 31 c0 c3 eb d4 0f 1f 40 00 0f 1f 44 00 00 be 02 00 00 00 e9 a1 fa ff ff 90 0f 1f 44 00 00 31 c0 65 48 8b 14 25 c0 7b 01 00 <f0> 48 0f b1 17 75 01 c3 eb d6 66 0f 1f 44 00 00 0f 1f 44 00 00 41
[186073.357670] RSP: 0018:ffffa4e9805b3e30 EFLAGS: 00010246
[186073.357673] RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000000
[186073.357675] RDX: ffff8fdd4bc95c40 RSI: 0000000000001b41 RDI: 0000000000000959
[186073.357677] RBP: 0000000000000959 R08: 0000000000000000 R09: ffffa4e9805b3850
[186073.357679] R10: ffffa4e9805b3848 R11: ffffffffb4ccb228 R12: ffff8fdd4bc96434
[186073.357681] R13: 0000000000000001 R14: 0000000000000001 R15: ffff8fdd4bc95c40
[186073.357683] FS:  0000000000000000(0000) GS:ffff8fe45fa00000(0000) knlGS:0000000000000000
[186073.357685] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[186073.357687] CR2: 0000000000000959 CR3: 0000000105e38003 CR4: 00000000001706f0
[186073.357689] Call Trace:
[186073.357695]  perf_event_exit_task+0x30/0x440
[186073.357702]  do_exit+0x355/0xa40
[186073.357705]  ? task_work_run+0x5c/0x90
[186073.357708]  ? do_exit+0x345/0xa40
[186073.357711]  ? kthread+0x133/0x150
[186073.357715]  ? rewind_stack_do_exit+0x17/0x17
[186073.357719] Modules linked in: ufs hfsplus hfs cdrom minix vfat msdos fat jfs xfs uas usb_storage btrfs blake2b_generic xor raid6_pq overlay bnep nct6775 hwmon_vid intel_rapl_msr intel_rapl_common snd_hda_codec_hdmi ath9k snd_hda_intel ath9k_common snd_intel_dspcfg soundwire_intel ath9k_hw soundwire_generic_allocation soundwire_cadence ath x86_pkg_temp_thermal intel_powerclamp snd_hda_codec coretemp snd_usb_audio kvm_intel mac80211 snd_hda_core soundwire_bus ath3k kvm snd_usbmidi_lib btusb snd_hwdep btrtl btbcm snd_soc_core iTCO_wdt eeepc_wmi intel_pmc_bxt at24 asus_wmi irqbypass iTCO_vendor_support wmi_bmof snd_rawmidi mei_hdcp sparse_keymap mxm_wmi snd_compress rapl snd_seq_device intel_cstate btintel mc i915 ac97_bus cfg80211 bluetooth snd_pcm_dmaengine snd_pcm snd_timer i2c_i801 ecdh_generic ecc mousedev intel_uncore snd i2c_smbus mei_me e1000e rfkill mei i2c_algo_bit libarc4 lpc_ich soundcore intel_gtt video wmi nf_log_ipv4 nf_log_common mac_hid ipt_REJECT nf_reject_ipv4 xt_LOG
[186073.358001]  xt_limit xt_addrtype xt_tcpudp xt_conntrack ip6_tables nf_conntrack_netbios_ns nf_conntrack_broadcast nf_nat_ftp nf_nat nf_conntrack_ftp nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c iptable_filter vboxnetflt(OE) vboxnetadp(OE) vboxdrv(OE) sg crypto_user fuse acpi_call(OE) bpf_preload ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd2 hid_logitech_hidpp hid_logitech_dj usbhid dm_crypt cbc encrypted_keys dm_mod trusted tpm rng_core crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel crypto_simd cryptd glue_helper xhci_pci xhci_pci_renesas nvidia_drm(POE) drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops cec drm agpgart nvidia_uvm(POE) nvidia_modeset(POE) nvidia(POE)
[186073.358195] CR2: 0000000000000959
[186073.358230] ---[ end trace 6b118e390fe42177 ]---
[186073.358549] RIP: 0010:_nv028498rm+0x9/0x90 [nvidia]
[186073.358584] Code: 8e ff e8 8a af 00 00 31 c0 48 83 c4 08 c3 31 c0 eb bf 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 83 ec 08 48 85 ff 74 57 <48> 8b 17 31 c0 48 85 d2 75 0e eb 2b 0f 1f 00 48 8b 52 10 48 85 d2
[186073.358619] RSP: 0018:ffffa4e9805b3bf0 EFLAGS: 00010202
[186073.358622] RAX: 0000000000000020 RBX: 0000000000000020 RCX: 0000000000000010
[186073.358624] RDX: ffff8fde16083988 RSI: ffffffffffffffff RDI: 0000000000000020
[186073.358627] RBP: ffff8fdd4e21d960 R08: ffffffffc1d6ab60 R09: ffff8fdd4e21d940
[186073.358629] R10: ffff8fdd4def4008 R11: ffff8fdd4def5098 R12: 0000000000000020
[186073.358632] R13: 0000000000000000 R14: ffff8fdd4e21dac8 R15: ffff8fdd4e21dbd0
[186073.358635] FS:  0000000000000000(0000) GS:ffff8fe45fa00000(0000) knlGS:0000000000000000
[186073.358637] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[186073.358672] CR2: 0000000000000959 CR3: 0000000105e38003 CR4: 00000000001706f0
[186073.358707] Fixing recursive fault but reboot is needed!

Crashes happen when I use a browser with iPython Notebook.

One more thing thatā€™s been bugging me for the past couple of months is that after my computer goes to sleep, Nvidia card never connects to the screen after the wake-up. I have to physically disconnect my HDMI cable and plug it back in for my computer to connect to it. Iā€™ve tried this with 3 different monitors so I know itā€™s not an issue with a monitor. For some reason, Nvidia card just doesnā€™t recognize the HDMI anymore after a wake-up but re-connecting the cable does make it connect to the screen again.

Thanks for any solutions or advice!

We still havenā€™t been able to reproduce the problem so we havenā€™t been able to verify a fix, but the next release will contain a change that is supposed to fix this. Once the next release comes out, please give it a try and let me know whether you still experience the problem.

1 Like

Just a follow-up to this post. The crash happened twice since I posted that, both times while trying to leave fullscreen mode of a youtube video on Chromium 88.

Now that I know why nvidia-bug-report.sh hangs, I was able to create a copy of it and then I made two small modifications, so the script can generate the log file successfully:

diff --git a/usr/bin/nvidia-bug-report.sh b/usr/bin/nvidia-bug-report.sh
index 0b735fc..991a755 100755
--- a/usr/bin/nvidia-bug-report.sh
+++ b/usr/bin/nvidia-bug-report.sh
@@ -445,7 +445,7 @@ done
 
 for subdir in $proc_module_dirs; do
     for GPU in `ls /proc/driver/nvidia/$subdir/gpus/ 2> /dev/null`; do
-        append "/proc/driver/nvidia/$subdir/gpus/$GPU/power"
+        echo -e "\nIGNORING: /proc/driver/nvidia/$subdir/gpus/$GPU/power"
     done
 done
 
@@ -971,7 +971,7 @@ append "/proc/mtrr"
 for subdir in $proc_module_dirs; do
     append "/proc/driver/nvidia/$subdir/version"
     for GPU in `ls /proc/driver/nvidia/$subdir/gpus/ 2> /dev/null`; do
-        append "/proc/driver/nvidia/$subdir/gpus/$GPU/information"
+        echo -e "\nIGNORING: /proc/driver/nvidia/$subdir/gpus/$GPU/information"
         append "/proc/driver/nvidia/$subdir/gpus/$GPU/registry"
     done
     append_glob "/proc/driver/nvidia/$subdir/warnings/*"

Then I ran the script as usual: sudo nvidia-bug-report.sh --safe-mode --extra-system-data
Iā€™m not sure if it matters, but I captured the log while the system was completely frozen (via SSH): nvidia-bug-report.log.gz (71.6 KB)

Since I left out two files due to my modifications of the script, here they are:

/proc/driver/nvidia/./gpus/*/information:

Model: 		 GeForce GTX 660
IRQ:   		 29
GPU UUID: 	 GPU-57cf86bc-3ad7-16f6-3e92-c7a285496146
Video BIOS: 	 80.06.28.00.6e
Bus Type: 	 PCIe
DMA Size: 	 40 bits
DMA Mask: 	 0xffffffffff
Bus Location: 	 0000:01:00.0
Device Minor: 	 0
Blacklisted:	 No

/proc/driver/nvidia/./gpus/*/power:

Runtime D3 status:          Disabled
Video Memory:               Active

GPU Hardware Support:
 Video Memory Self Refresh: Not Supported
 Video Memory Off:          Not Supported

Hopefully this can at least help prevent the script from hanging in future drivers.

More crashes on my side, too.
Arch Linux, kernel 5.10.9, driver 460.32.03.
When it happened again, I logged (ssh) to the system from a laptop.
The script nvidia-bug-report.sh hanged, because /proc/driver/nvidia/gpus/0000:01:00.0/power and /proc/driver/nvidia/gpus/0000:01:00.0/information canā€™t be read. I modified the script in a similar way as above and run it as:
# nvidia-bug-report.sh --safe-mode --extra-system-data
Hereā€™s the report file: nvidia-bug-report.log.gz (70.9 KB)

And the missing files, which can be read (but not after the crash).

$ cat /proc/driver/nvidia/./gpus/0000:01:00.0/information
Model: 		 GeForce GTX 1660 SUPER
IRQ:   		 35
GPU UUID: 	 GPU-31343750-60f9-fedd-d18f-0dd2131d0436
Video BIOS: 	 90.16.42.00.6c
Bus Type: 	 PCIe
DMA Size: 	 47 bits
DMA Mask: 	 0x7fffffffffff
Bus Location: 	 0000:01:00.0
Device Minor: 	 0
Blacklisted:	 No
$ cat /proc/driver/nvidia/./gpus/0000:01:00.0/power
Runtime D3 status:          Disabled
Video Memory:               Active

GPU Hardware Support:
 Video Memory Self Refresh: Supported
 Video Memory Off:          Supported

Not sure if problem is because of the driver but I have the same logs as all of you so I suppose I have this problem. Manjaro, 5.10, 460.32.03, GTX 1050Ti

21:21:15 fedya-pc kernel: BUG: kernel NULL pointer dereference, address: 0000000000000020
21:21:15 fedya-pc kernel: #PF: supervisor read access in kernel mode
21:21:15 fedya-pc kernel: #PF: error_code(0x0000) - not-present page
21:21:15 fedya-pc kernel: PGD 800000012f9a0067 P4D 800000012f9a0067 PUD 0 
21:21:15 fedya-pc kernel: Oops: 0000 [#1] PREEMPT SMP PTI
21:21:15 fedya-pc kernel: CPU: 2 PID: 654 Comm: irq/127-nvidia Tainted: P           OE     5.10.7-3-MA>
21:21:15 fedya-pc kernel: Hardware name: MSI MS-7A74/B250M PRO-VD (MS-7A74), BIOS 1.40 04/06/2017
21:21:15 fedya-pc kernel: RIP: 0010:_nv028498rm+0x9/0x90 [nvidia]
21:21:15 fedya-pc kernel: Code: 8e ff e8 8a af 00 00 31 c0 48 83 c4 08 c3 31 c0 eb bf 66 2e 0f 1f 84 0>
21:21:15 fedya-pc kernel: RSP: 0018:ffffaf28012e3c00 EFLAGS: 00010202
21:21:15 fedya-pc kernel: RAX: 0000000000000020 RBX: 0000000000000020 RCX: 0000000000000010
21:21:15 fedya-pc kernel: RDX: ffff90b357c1ff08 RSI: ffffffffffffffff RDI: 0000000000000020
21:21:15 fedya-pc kernel: RBP: ffff90b21e0c59f0 R08: ffffffffc2296b60 R09: ffff90b21e0c59d0
21:21:15 fedya-pc kernel: R10: ffff90b21c684008 R11: ffff90b21c685098 R12: 0000000000000020
21:21:15 fedya-pc kernel: R13: 0000000000000000 R14: ffff90b21e0c5b58 R15: ffff90b21e0c5c98
21:21:15 fedya-pc kernel: FS:  0000000000000000(0000) GS:ffff90b55ed00000(0000) knlGS:0000000000000000
21:21:15 fedya-pc kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
21:21:15 fedya-pc kernel: CR2: 0000000000000020 CR3: 00000003075b6006 CR4: 00000000003706e0
21:21:15 fedya-pc kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
21:21:15 fedya-pc kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
21:21:15 fedya-pc kernel: Call Trace:
21:21:15 fedya-pc kernel:  ? _nv030766rm+0x1b/0x90 [nvidia]
21:21:15 fedya-pc kernel:  ? _nv026432rm+0x18/0x60 [nvidia]
21:21:15 fedya-pc kernel:  ? _nv012979rm+0x13d/0x1c0 [nvidia]
21:21:15 fedya-pc kernel:  ? _nv000081rm+0x12f/0x1a0 [nvidia]
21:21:15 fedya-pc kernel:  ? _nv030830rm+0xb9/0x330 [nvidia]
21:21:15 fedya-pc kernel:  ? _nv030829rm+0x68/0x80 [nvidia]
21:21:15 fedya-pc kernel:  ? _nv030829rm+0x3e/0x80 [nvidia]
21:21:15 fedya-pc kernel:  ? _nv012905rm+0x120/0x160 [nvidia]
21:21:15 fedya-pc kernel:  ? _nv025575rm+0x251/0x3e0 [nvidia]
21:21:15 fedya-pc kernel:  ? _nv025524rm+0x1f/0xf0 [nvidia]
21:21:15 fedya-pc kernel:  ? _nv016719rm+0xd3/0x3c0 [nvidia]
21:21:15 fedya-pc kernel:  ? _nv028705rm+0xb23/0xdc0 [nvidia]
21:21:15 fedya-pc kernel:  ? _nv028713rm+0x15d/0x400 [nvidia]
21:21:15 fedya-pc kernel:  ? _nv000709rm+0xa9/0x240 [nvidia]
21:21:15 fedya-pc kernel:  ? disable_irq_nosync+0x10/0x10
21:21:15 fedya-pc kernel:  ? rm_isr_bh+0x1c/0x60 [nvidia]
21:21:15 fedya-pc kernel:  ? nvidia_isr_kthread_bh+0x1b/0x40 [nvidia]
21:21:15 fedya-pc kernel:  ? irq_thread_fn+0x20/0x60
21:21:15 fedya-pc kernel:  ? irq_thread+0xf5/0x1a0
21:21:15 fedya-pc kernel:  ? irq_finalize_oneshot.part.0+0xe0/0xe0
21:21:15 fedya-pc kernel:  ? irq_thread_check_affinity+0xd0/0xd0
21:21:15 fedya-pc kernel:  ? kthread+0x133/0x150
21:21:15 fedya-pc kernel:  ? __kthread_bind_mask+0x60/0x60
21:21:15 fedya-pc kernel:  ? ret_from_fork+0x22/0x30
21:21:15 fedya-pc kernel: Modules linked in: uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_v4l>
21:21:15 fedya-pc kernel:  fb_sys_fops mei mac_hid video wmi acpi_pad nvidia(POE) vboxnetflt(OE) vboxn>
21:21:15 fedya-pc kernel: CR2: 0000000000000020
21:21:15 fedya-pc kernel: ---[ end trace e8a40a93e8b90033 ]---
21:21:15 fedya-pc kernel: RIP: 0010:_nv028498rm+0x9/0x90 [nvidia]
21:21:15 fedya-pc kernel: Code: 8e ff e8 8a af 00 00 31 c0 48 83 c4 08 c3 31 c0 eb bf 66 2e 0f 1f 84 0>
21:21:15 fedya-pc kernel: RSP: 0018:ffffaf28012e3c00 EFLAGS: 00010202
21:21:15 fedya-pc kernel: RAX: 0000000000000020 RBX: 0000000000000020 RCX: 0000000000000010
21:21:15 fedya-pc kernel: RDX: ffff90b357c1ff08 RSI: ffffffffffffffff RDI: 0000000000000020
21:21:15 fedya-pc kernel: RBP: ffff90b21e0c59f0 R08: ffffffffc2296b60 R09: ffff90b21e0c59d0
21:21:15 fedya-pc kernel: R10: ffff90b21c684008 R11: ffff90b21c685098 R12: 0000000000000020
21:21:15 fedya-pc kernel: R13: 0000000000000000 R14: ffff90b21e0c5b58 R15: ffff90b21e0c5c98
21:21:15 fedya-pc kernel: FS:  0000000000000000(0000) GS:ffff90b55ed00000(0000) knlGS:0000000000000000
21:21:15 fedya-pc kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
21:21:15 fedya-pc kernel: CR2: 0000000000000020 CR3: 00000003075b6006 CR4: 00000000003706e0
21:21:15 fedya-pc kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
21:21:15 fedya-pc kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
21:21:15 fedya-pc kernel: BUG: kernel NULL pointer dereference, address: 0000000000000959
21:21:15 fedya-pc kernel: #PF: supervisor write access in kernel mode
21:21:15 fedya-pc kernel: #PF: error_code(0x0002) - not-present page
21:21:15 fedya-pc kernel: PGD 800000012f9a0067 P4D 800000012f9a0067 PUD 0 
21:21:15 fedya-pc kernel: Oops: 0002 [#2] PREEMPT SMP PTI
21:21:15 fedya-pc kernel: CPU: 2 PID: 654 Comm: irq/127-nvidia Tainted: P      D    OE     5.10.7-3-MA>
21:21:15 fedya-pc kernel: Hardware name: MSI MS-7A74/B250M PRO-VD (MS-7A74), BIOS 1.40 04/06/2017
21:21:15 fedya-pc kernel: RIP: 0010:mutex_lock+0x10/0x20
21:21:15 fedya-pc kernel: Code: 03 31 c0 c3 eb d4 0f 1f 40 00 0f 1f 44 00 00 be 02 00 00 00 e9 a1 fa f>
21:21:15 fedya-pc kernel: RSP: 0018:ffffaf28012e3e30 EFLAGS: 00010246
21:21:15 fedya-pc kernel: RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000000
21:21:15 fedya-pc kernel: RDX: ffff90b204908000 RSI: 0000000000001b41 RDI: 0000000000000959
21:21:15 fedya-pc kernel: RBP: 0000000000000959 R08: 000000000000000f R09: 0000000000000000
21:21:15 fedya-pc kernel: R10: ffff90b21c6f8400 R11: 0000000000000000 R12: ffff90b2049087f4
21:21:15 fedya-pc kernel: R13: 0000000000000001 R14: 0000000000000001 R15: ffff90b204908000
21:21:15 fedya-pc kernel: FS:  0000000000000000(0000) GS:ffff90b55ed00000(0000) knlGS:0000000000000000
21:21:15 fedya-pc kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
21:21:15 fedya-pc kernel: CR2: 0000000000000959 CR3: 00000003075b6006 CR4: 00000000003706e0
21:21:15 fedya-pc kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
21:21:15 fedya-pc kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
21:21:15 fedya-pc kernel: Call Trace:
21:21:15 fedya-pc kernel:  perf_event_exit_task+0x30/0x440
21:21:15 fedya-pc kernel:  ? kfree+0x40c/0x440
21:21:15 fedya-pc kernel:  do_exit+0x355/0xa40
21:21:15 fedya-pc kernel:  ? task_work_run+0x5c/0x90
21:21:15 fedya-pc kernel:  ? do_exit+0x345/0xa40
21:21:15 fedya-pc kernel:  ? kthread+0x133/0x150
21:21:15 fedya-pc kernel:  ? rewind_stack_do_exit+0x17/0x17
21:21:15 fedya-pc kernel: Modules linked in: uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_v4l>
21:21:15 fedya-pc kernel:  fb_sys_fops mei mac_hid video wmi acpi_pad nvidia(POE) vboxnetflt(OE) vboxn>
21:21:15 fedya-pc kernel: CR2: 0000000000000959
21:21:15 fedya-pc kernel: ---[ end trace e8a40a93e8b90034 ]---
21:21:15 fedya-pc kernel: RIP: 0010:_nv028498rm+0x9/0x90 [nvidia]
21:21:15 fedya-pc kernel: Code: 8e ff e8 8a af 00 00 31 c0 48 83 c4 08 c3 31 c0 eb bf 66 2e 0f 1f 84 0>
21:21:15 fedya-pc kernel: RSP: 0018:ffffaf28012e3c00 EFLAGS: 00010202
21:21:15 fedya-pc kernel: RAX: 0000000000000020 RBX: 0000000000000020 RCX: 0000000000000010
21:21:15 fedya-pc kernel: RDX: ffff90b357c1ff08 RSI: ffffffffffffffff RDI: 0000000000000020
21:21:15 fedya-pc kernel: RBP: ffff90b21e0c59f0 R08: ffffffffc2296b60 R09: ffff90b21e0c59d0
21:21:15 fedya-pc kernel: R10: ffff90b21c684008 R11: ffff90b21c685098 R12: 0000000000000020
21:21:15 fedya-pc kernel: R13: 0000000000000000 R14: ffff90b21e0c5b58 R15: ffff90b21e0c5c98
21:21:15 fedya-pc kernel: FS:  0000000000000000(0000) GS:ffff90b55ed00000(0000) knlGS:0000000000000000
21:21:15 fedya-pc kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
21:21:15 fedya-pc kernel: CR2: 0000000000000959 CR3: 00000003075b6006 CR4: 00000000003706e0
21:21:15 fedya-pc kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
21:21:15 fedya-pc kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
21:21:15 fedya-pc kernel: Fixing recursive fault but reboot is needed!

Looks like Linus Torvalds said right thing about nvidiaā€¦

1 Like