Bug report: 455.23.04 - Kernel Panic due to NULL pointer dereference

“Just saying stuff like this would never fly in windows.”
*
*
An end user comparing Windows to Linux is like comparing Oranges to Dreaming. Yes Windows and Linux are comparable products at Market.
However…
even though Open Source and Closed Source Software do coexist in the same market place they are based in separate realities that are at a minimum, both perfectly diametrically opposed philosophical / economic ecosystems.
*
The most critical attributes of these Closed Source and Open Source philosophical / economic ecosystems are that;
Closed source ecosystems are conducive to monopoly.
Open Source Ecosystems are adverse to monopoly.
*
How is this even related to nVidia Unix Drivers???
Well Windows is a closed sourced product and presented to market entry point at CO$T in a “finished” state to the End User.“.
This “finished” state has unfortunately set Market expectation by the majority of End Users.(Marketing)
*
Linux is Open Sourced and presented to market entry point at NO COST but in an un-configured and non-integrated state.
The End user is FREE to configure and integrate the Linux Operating System to their specific hardware OR pay an Open Source Enterprise sole trader or Company to provide that service or product for them.
*
Linux as an open source software product is not supposed to work “Out of Box”. Nor should it ever " work out of Box”
This “Work out of Box” philosophy is a contamination from the closed source economy in which it is a product requirement and is at polar opposites to the core ecosystem driver of the Open Source Economy.
*
Correctly and exactly integrating and configuring Linux to a specific hardware platform is well beyond the realisation of most. Secureboot, CSM, DMA Buffer, Numa, IOMMU is about 1/800ths of the settings, parameters …and furthermore whatevers that need to be configured and integrated on a base linux OS. The Bios has to be set to the kernel, The kernel to the bios, the firmware and drivers to the kernel and the bios and then the DE to the driver, firmware, kernel and bios. Not forgetting the 50+ plus sub branches to each of those.
With another 300+ issues with varying system PCI-e configs and the like.

  • The nVidia Unix Driver requires:
  • correct system voltages(vac),
  • All system hardware benching at ManSpec.
  • All System hardware bookmatched and compatible as per ManSpec.
  • The nVidia driver Installed as per Man Spec… Not distro suggests or reddit forum posts or website tutorials.
  • Installed as per the nVidia driver Man Spec WhitePaper
  • A correctly configured bios, kernel, “Linux OS” DE.

I have many systems all with widely varying hardware configurations under my duty of care.
And all of them have outstanding stability, uptime, and performance.
All of them running nvidia.
They started running by the book shortly after I started reading the book.
(note: Yes, desktops and servers have differing metric definitions of uptime.
not that it matters here but someone always mentions it.)

Re: Linux Torvalds nVidia closed source comment.
I wonder if nVidia Corp. recognised that Microsoft was weaponizing Open Source.
Take a look at AMD / Pluton processors.
Microsofts stated plan in 2007 was to centralise the internet.
All Hail iAGPU Edge2Cloud personal computing 2022.

http://us.download.nvidia.com/XFree86/Linux-x86_64/460.39/README/index.html
https://www.kernel.org/doc/html/v4.14/admin-guide/index.html
https://wiki.archlinux.org/index.php/NVIDIA
https://wiki.archlinux.org/index.php/NVIDIA/Tips_and_tricks
https://wiki.archlinux.org/index.php/NVIDIA/Troubleshooting
https://wiki.archlinux.org/index.php/GRUB/Tips_and_tricks

Lastly, please consider this…
In the Matrix,
If Neo entered into a Flame-War with Trinity and Morpheus every time they they tried to explain to him a different reality reality.
That story would have ended real quick.
but instead Neo chose to …

Your analogy doesn’t fit. If you want to argue at least learn how.

Driver version 460.X has not fixed the issue as NVIDIA claimed in this thread.

Here is my report, not that you will do anything with it:

Feb 19 15:40:22 razor systemd[940]: Started Chromium - Web Browser.
Feb 19 15:40:23 razor systemd[940]: app-chromium-307a9274d5c545e59b66f10d58ff9339.scope: Succeeded.
Feb 19 15:40:31 razor systemd[940]: app-chromium-e5e875cf535648b39e8a54f52a52e264.scope: Succeeded.
Feb 19 15:40:32 razor systemd[940]: Started Chromium - Web Browser.
Feb 19 17:14:43 razor kernel: BUG: kernel NULL pointer dereference, address: 0000000000000008
Feb 19 17:14:43 razor kernel: #PF: supervisor read access in kernel mode
Feb 19 17:14:43 razor kernel: #PF: error_code(0x0000) - not-present page
Feb 19 17:14:43 razor kernel: PGD 295c15067 P4D 295c15067 PUD 0 
Feb 19 17:14:43 razor kernel: Oops: 0000 [#1] PREEMPT SMP NOPTI
Feb 19 17:14:43 razor kernel: CPU: 22 PID: 1083 Comm: irq/159-nvidia Tainted: P           OE     5.10.16-arch1-1 #1
Feb 19 17:14:43 razor kernel: Hardware name: Gigabyte Technology Co., Ltd. X570 AORUS PRO WIFI/X570 AORUS PRO WIFI, BIOS F12 06/24/2020
Feb 19 17:14:43 razor kernel: RIP: 0010:_nv013013rm+0xd8/0x130 [nvidia]
Feb 19 17:14:43 razor kernel: Code: 40 00 31 c0 5b 41 5c 41 5d c3 0f 1f 84 00 00 00 00 00 48 c7 46 38 00 00 00 00 48 89 f7 45 31 ed e8 dd 7a ff ff eb 9b 0f 1f 00 <49> 8b 7c 24 08 e8 de 29 00 00 48 85 c0 74 b4 49 83 7c 24 08 00 74
Feb 19 17:14:43 razor kernel: RSP: 0018:ffffa3174170fc70 EFLAGS: 00010246
Feb 19 17:14:43 razor kernel: RAX: 0000000000000001 RBX: ffff8fbd0bd85b58 RCX: 0000000000000010
Feb 19 17:14:43 razor kernel: RDX: ffff8fc074786758 RSI: 00000000004789f2 RDI: ffff8fbefebc57b0
Feb 19 17:14:43 razor kernel: RBP: ffff8fbd0bd85ae0 R08: ffffffffc4eb4960 R09: ffff8fbd0bd85a30
Feb 19 17:14:43 razor kernel: R10: ffff8fbd05e38008 R11: ffff8fbd05e39098 R12: 0000000000000000
Feb 19 17:14:43 razor kernel: R13: 0000000000000001 R14: 00000000beef0003 R15: ffff8fbd0bd85ba0
Feb 19 17:14:43 razor kernel: FS:  0000000000000000(0000) GS:ffff8fc3fed80000(0000) knlGS:0000000000000000
Feb 19 17:14:43 razor kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Feb 19 17:14:43 razor kernel: CR2: 0000000000000008 CR3: 00000002a2cb8000 CR4: 0000000000350ee0
Feb 19 17:14:43 razor kernel: Call Trace:
Feb 19 17:14:43 razor kernel:  ? _nv000082rm+0x16c/0x1e0 [nvidia]
Feb 19 17:14:43 razor kernel:  ? _nv012946rm+0xff/0x180 [nvidia]
Feb 19 17:14:43 razor kernel:  ? _nv019582rm+0x1af/0x210 [nvidia]
Feb 19 17:14:43 razor kernel:  ? _nv019533rm+0xdf2/0xef0 [nvidia]
Feb 19 17:14:43 razor kernel:  ? _nv019534rm+0xf3/0x290 [nvidia]
Feb 19 17:14:43 razor kernel:  ? _nv019535rm+0x12f/0x350 [nvidia]
Feb 19 17:14:43 razor kernel:  ? _nv019536rm+0x1f5/0x320 [nvidia]
Feb 19 17:14:43 razor kernel:  ? _nv019511rm+0x1bf/0x630 [nvidia]
Feb 19 17:14:43 razor kernel:  ? _nv028768rm+0x15d/0x400 [nvidia]
Feb 19 17:14:43 razor kernel:  ? _nv000710rm+0xa9/0x240 [nvidia]
Feb 19 17:14:43 razor kernel:  ? disable_irq_nosync+0x10/0x10
Feb 19 17:14:43 razor kernel:  ? rm_isr_bh+0x1c/0x60 [nvidia]
Feb 19 17:14:43 razor kernel:  ? nvidia_isr_kthread_bh+0x1b/0x40 [nvidia]
Feb 19 17:14:43 razor kernel:  ? irq_thread_fn+0x20/0x60
Feb 19 17:14:43 razor kernel:  ? irq_thread+0xf5/0x1a0
Feb 19 17:14:43 razor kernel:  ? irq_finalize_oneshot.part.0+0xe0/0xe0
Feb 19 17:14:43 razor kernel:  ? irq_thread_check_affinity+0xd0/0xd0
Feb 19 17:14:43 razor kernel:  ? kthread+0x133/0x150
Feb 19 17:14:43 razor kernel:  ? __kthread_bind_mask+0x60/0x60
Feb 19 17:14:43 razor kernel:  ? ret_from_fork+0x22/0x30
Feb 19 17:14:43 razor kernel: Modules linked in: snd_seq_dummy snd_seq nvidia_uvm(POE) rfcomm hid_logitech_hidpp nvidia_drm(POE) nvidia_modeset(POE) cmac algif_hash algif_skcipher 8021q af_alg garp mrp bnep stp hid_logitech_dj wmi_bmof mxm_wmi llc nvidia(POE) snd_hda_codec_realtek snd_hda_codec_generic snd_hda_codec_hdmi ledtrig_audio snd_hda_intel tun snd_intel_dspcfg iwlmvm soundwire_intel soundwire_generic_allocation soundwire_cadence edac_mce_amd mac80211 snd_hda_codec libarc4 snd_usb_audio snd_hda_core soundwire_bus iwlwifi snd_usbmidi_lib nls_iso8859_1 snd_soc_core snd_hwdep uvcvideo btusb vfat videobuf2_vmalloc btrtl snd_rawmidi fat videobuf2_memops btbcm snd_compress videobuf2_v4l2 btintel ac97_bus snd_seq_device snd_pcm_dmaengine kvm videobuf2_common snd_pcm bluetooth drm_kms_helper ccp videodev irqbypass crct10dif_pclmul crc32_pclmul snd_timer ghash_clmulni_intel aesni_intel igb snd cec crypto_simd syscopyarea cryptd sysfillrect glue_helper ecdh_generic cfg80211 mousedev i2c_algo_bit sp5100_tco
Feb 19 17:14:43 razor kernel:  sysimgblt mc rng_core ecc rapl pcspkr soundcore k10temp fb_sys_fops i2c_piix4 rfkill dca wmi pinctrl_amd acpi_cpufreq joydev mac_hid drm fuse crypto_user agpgart bpf_preload ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd2 uas usb_storage usbhid crc32c_intel xhci_pci xhci_pci_renesas
Feb 19 17:14:43 razor kernel: CR2: 0000000000000008
Feb 19 17:14:43 razor kernel: ---[ end trace 10878197276a2c52 ]---
Feb 19 17:14:43 razor kernel: RIP: 0010:_nv013013rm+0xd8/0x130 [nvidia]
Feb 19 17:14:43 razor kernel: Code: 40 00 31 c0 5b 41 5c 41 5d c3 0f 1f 84 00 00 00 00 00 48 c7 46 38 00 00 00 00 48 89 f7 45 31 ed e8 dd 7a ff ff eb 9b 0f 1f 00 <49> 8b 7c 24 08 e8 de 29 00 00 48 85 c0 74 b4 49 83 7c 24 08 00 74
Feb 19 17:14:43 razor kernel: RSP: 0018:ffffa3174170fc70 EFLAGS: 00010246
Feb 19 17:14:43 razor kernel: RAX: 0000000000000001 RBX: ffff8fbd0bd85b58 RCX: 0000000000000010
Feb 19 17:14:43 razor kernel: RDX: ffff8fc074786758 RSI: 00000000004789f2 RDI: ffff8fbefebc57b0
Feb 19 17:14:43 razor kernel: RBP: ffff8fbd0bd85ae0 R08: ffffffffc4eb4960 R09: ffff8fbd0bd85a30
Feb 19 17:14:43 razor kernel: R10: ffff8fbd05e38008 R11: ffff8fbd05e39098 R12: 0000000000000000
Feb 19 17:14:43 razor kernel: R13: 0000000000000001 R14: 00000000beef0003 R15: ffff8fbd0bd85ba0
Feb 19 17:14:43 razor kernel: FS:  0000000000000000(0000) GS:ffff8fc3fed80000(0000) knlGS:0000000000000000
Feb 19 17:14:43 razor kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Feb 19 17:14:43 razor kernel: CR2: 0000000000000008 CR3: 00000002a2cb8000 CR4: 0000000000350ee0
Feb 19 17:14:43 razor kernel: BUG: kernel NULL pointer dereference, address: 0000000000000959
Feb 19 17:14:43 razor kernel: #PF: supervisor write access in kernel mode
Feb 19 17:14:43 razor kernel: #PF: error_code(0x0002) - not-present page
Feb 19 17:14:43 razor kernel: PGD 295c15067 P4D 295c15067 PUD 0 
Feb 19 17:14:43 razor kernel: Oops: 0002 [#2] PREEMPT SMP NOPTI
Feb 19 17:14:43 razor kernel: CPU: 22 PID: 1083 Comm: irq/159-nvidia Tainted: P      D    OE     5.10.16-arch1-1 #1
Feb 19 17:14:43 razor kernel: Hardware name: Gigabyte Technology Co., Ltd. X570 AORUS PRO WIFI/X570 AORUS PRO WIFI, BIOS F12 06/24/2020
Feb 19 17:14:43 razor kernel: RIP: 0010:mutex_lock+0x10/0x20
Feb 19 17:14:43 razor kernel: Code: 03 31 c0 c3 eb d4 0f 1f 40 00 0f 1f 44 00 00 be 02 00 00 00 e9 a1 fa ff ff 90 0f 1f 44 00 00 31 c0 65 48 8b 14 25 c0 7b 01 00 <f0> 48 0f b1 17 75 01 c3 eb d6 66 0f 1f 44 00 00 0f 1f 44 00 00 41
Feb 19 17:14:43 razor kernel: RSP: 0018:ffffa3174170fe30 EFLAGS: 00010246
Feb 19 17:14:43 razor kernel: RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000000
Feb 19 17:14:43 razor kernel: RDX: ffff8fbd1c98dc40 RSI: 0000000000001b41 RDI: 0000000000000959
Feb 19 17:14:43 razor kernel: RBP: 0000000000000959 R08: 0000000000000001 R09: 0000000000000000
Feb 19 17:14:43 razor kernel: R10: ffff8fbd14f62c00 R11: 0000000000000000 R12: ffff8fbd1c98e434
Feb 19 17:14:43 razor kernel: R13: 0000000000000001 R14: 0000000000000001 R15: ffff8fbd1c98dc40
Feb 19 17:14:43 razor kernel: FS:  0000000000000000(0000) GS:ffff8fc3fed80000(0000) knlGS:0000000000000000
Feb 19 17:14:43 razor kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Feb 19 17:14:43 razor kernel: CR2: 0000000000000959 CR3: 00000002a2cb8000 CR4: 0000000000350ee0
Feb 19 17:14:43 razor kernel: Call Trace:
Feb 19 17:14:43 razor kernel:  perf_event_exit_task+0x30/0x440
Feb 19 17:14:43 razor kernel:  ? kfree+0x40c/0x440
Feb 19 17:14:43 razor kernel:  do_exit+0x382/0xa70
Feb 19 17:14:43 razor kernel:  ? task_work_run+0x5c/0x90
Feb 19 17:14:43 razor kernel:  ? do_exit+0x372/0xa70
Feb 19 17:14:43 razor kernel:  ? kthread+0x133/0x150
Feb 19 17:14:43 razor kernel:  ? rewind_stack_do_exit+0x17/0x17
Feb 19 17:14:43 razor kernel: Modules linked in: snd_seq_dummy snd_seq nvidia_uvm(POE) rfcomm hid_logitech_hidpp nvidia_drm(POE) nvidia_modeset(POE) cmac algif_hash algif_skcipher 8021q af_alg garp mrp bnep stp hid_logitech_dj wmi_bmof mxm_wmi llc nvidia(POE) snd_hda_codec_realtek snd_hda_codec_generic snd_hda_codec_hdmi ledtrig_audio snd_hda_intel tun snd_intel_dspcfg iwlmvm soundwire_intel soundwire_generic_allocation soundwire_cadence edac_mce_amd mac80211 snd_hda_codec libarc4 snd_usb_audio snd_hda_core soundwire_bus iwlwifi snd_usbmidi_lib nls_iso8859_1 snd_soc_core snd_hwdep uvcvideo btusb vfat videobuf2_vmalloc btrtl snd_rawmidi fat videobuf2_memops btbcm snd_compress videobuf2_v4l2 btintel ac97_bus snd_seq_device snd_pcm_dmaengine kvm videobuf2_common snd_pcm bluetooth drm_kms_helper ccp videodev irqbypass crct10dif_pclmul crc32_pclmul snd_timer ghash_clmulni_intel aesni_intel igb snd cec crypto_simd syscopyarea cryptd sysfillrect glue_helper ecdh_generic cfg80211 mousedev i2c_algo_bit sp5100_tco
Feb 19 17:14:43 razor kernel:  sysimgblt mc rng_core ecc rapl pcspkr soundcore k10temp fb_sys_fops i2c_piix4 rfkill dca wmi pinctrl_amd acpi_cpufreq joydev mac_hid drm fuse crypto_user agpgart bpf_preload ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd2 uas usb_storage usbhid crc32c_intel xhci_pci xhci_pci_renesas
Feb 19 17:14:43 razor kernel: CR2: 0000000000000959
Feb 19 17:14:43 razor kernel: ---[ end trace 10878197276a2c53 ]---
Feb 19 17:14:43 razor kernel: RIP: 0010:_nv013013rm+0xd8/0x130 [nvidia]
Feb 19 17:14:43 razor kernel: Code: 40 00 31 c0 5b 41 5c 41 5d c3 0f 1f 84 00 00 00 00 00 48 c7 46 38 00 00 00 00 48 89 f7 45 31 ed e8 dd 7a ff ff eb 9b 0f 1f 00 <49> 8b 7c 24 08 e8 de 29 00 00 48 85 c0 74 b4 49 83 7c 24 08 00 74
Feb 19 17:14:43 razor kernel: RSP: 0018:ffffa3174170fc70 EFLAGS: 00010246
Feb 19 17:14:43 razor kernel: RAX: 0000000000000001 RBX: ffff8fbd0bd85b58 RCX: 0000000000000010
Feb 19 17:14:43 razor kernel: RDX: ffff8fc074786758 RSI: 00000000004789f2 RDI: ffff8fbefebc57b0
Feb 19 17:14:43 razor kernel: RBP: ffff8fbd0bd85ae0 R08: ffffffffc4eb4960 R09: ffff8fbd0bd85a30
Feb 19 17:14:43 razor kernel: R10: ffff8fbd05e38008 R11: ffff8fbd05e39098 R12: 0000000000000000
Feb 19 17:14:43 razor kernel: R13: 0000000000000001 R14: 00000000beef0003 R15: ffff8fbd0bd85ba0
Feb 19 17:14:43 razor kernel: FS:  0000000000000000(0000) GS:ffff8fc3fed80000(0000) knlGS:0000000000000000
Feb 19 17:14:43 razor kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Feb 19 17:14:43 razor kernel: CR2: 0000000000000959 CR3: 00000002a2cb8000 CR4: 0000000000350ee0
Feb 19 17:14:43 razor kernel: Fixing recursive fault but reboot is needed!
Feb 19 17:15:19 razor audit[4592]: ANOM_ABEND auid=1000 uid=1000 gid=1000 ses=1 pid=4592 comm="GpuWatchdog" exe="/usr/lib/signal-desktop/signal-desktop" sig=11 res=1
Feb 19 17:15:19 razor kernel: GpuWatchdog[4611]: segfault at 0 ip 000055ac085fe107 sp 00007fb4bcd72570 error 6 in signal-desktop[55ac0541d000+53d6000]
Feb 19 17:15:19 razor kernel: Code: 7d b7 00 79 09 48 8b 7d a0 e8 35 52 d3 fe 8b 83 00 01 00 00 85 c0 0f 84 91 00 00 00 48 8b 03 48 89 df be 01 00 00 00 ff 50 68 <c7> 04 25 00 00 00 00 37 13 00 00 c6 05 17 bc 6f 02 01 80 7d 87 00
Feb 19 17:15:19 razor kernel: audit: type=1701 audit(1613715319.324:161): auid=1000 uid=1000 gid=1000 ses=1 pid=4592 comm="GpuWatchdog" exe="/usr/lib/signal-desktop/signal-desktop" sig=11 res=1
1 Like

This is depressing lol…

1 Like

I have driver 440. I updated to the latest after 5 months as I thought they would have fixed it by now since the new kernel update. They clearly haven’t.

Seriously, I’m starting to think the people who are still defending nvidia are actually paid shills. What kind of normal person with a functional brain and sees what is actually happening is still defending them?

If nvidia REALLY wanted “bug reports” as they keep saying they do. HINT!!!

THEY DON’T
THEY DON’T
THEY DON’T
THEY DON’T
THEY DON’T
THEY DON’T
THEY DON’T
THEY DON’T
THEY DON’T
THEY DON’T
THEY DON’T
THEY DON’T
THEY DON’T
THEY DON’T
THEY DON’T
THEY DON’T
THEY DON’T
THEY DON’T
THEY DON’T
THEY DON’T
THEY DON’T

THEY JUST WANT TO SHUT US UP.

The global market for PCs total over 1 billion PCs. (Actually way more but I’ll give chuck nvidia another bone, dumb ik)

Linux market share is around 3% (https://www.debugpoint.com/2020/07/linux-desktop-market-share-peaked-to-all-time-high-in-june/)

Also probably way more as Linux users are not the type to share this sort of stuff and participate in surveys.

https://store.steampowered.com/hwsurvey/videocard/
Nvidia has about 75% of GPUs on steam. I’ll again go for a lower number: 70%.

1000000000*0.03
            30000000
 . * .7
            21000000

That is 20M++ users on the LOW END!!!

Do you know how they can get bug reports?

Put a warning message on install.

1 LINE! Here I’ll help

echo "Driver 440+ are buggy and causes crashes, please report this to the forum if you see anything. Not that we are gonna do anything lmao dumbass" >> $install_script

Or send out a email.

There is 20M+ of us affected by this bug. Even if just .00001 of users reported this would be 200 bug reports!!! And lets just say A LOT MORE than .00001 of people would be angry if they knew nvidia was causing this.

At the price of $100 USD a GPU that is 2 BILLION THEY HAVE TAKEN FROM US. (also far too low)

THERE IS ZERO EXCUSE. THE TERRIBLE LINUX SUPPORT AND NOW THIS!

The only reason I can think of why nvidia hasn’t fixed it is that they either:

  • just don’t care
  • bad PR if they actually admitted they messed up
1 Like

Looking at all the bug reports, where this is happening in the kernel and systems where this bug doesn’t occur, I get the feeling the problem might be kernel preemption. Ubuntu and Fedora, e.g., don’t use preemptible kernels. Does Arch provide non-preemptible kernel builds to check?

1 Like

This user reports experiencing the bug on Kubuntu though.

1 Like

Ok, I missed that. So I guess it’s back to the drawing board.

1 Like

I have seen certain posts on this thread describe how running chromium with “–use-gl=desktop” has protected against kernel bugs. I personally have found this method to work for months until yesterday while in a video call playing GeoGuessr. I had another crash similar to the one I posted about before, even though I had “–use-gl=desktop” as one of my chromium flags at the time.

What could be related though was my resetting my chromium settings that afternoon due to an unrelated issue. I’m pretty sure I restored all the previous settings I had before the reset (and my flags were untouched), but funnily enough, the YouTube playback issues I experienced before with “–use-gl=desktop” (which I mentioned in my previous post and more in depth here) also disappeared. So either, something in that reset caused chromium to ignore my “–use-gl=desktop” flag, or a series of other changes to chromium/nvidia that day (despite no package updates in almost a week) caused my --use-gl=desktop problems to disappear and for the kernel bug to reappear.

I’m just very confused right now, so I was wondering whether any of the other users in this chain who have been using “–use-gl=desktop” experienced anything similar.

Journal logs:
crash.txt (12.0 KB)

1 Like

Hello!

I have now reproduced this crash three times in a row while playing a video in VLC
crash.txt (10.9 KB)
(txt contains the common journalctl crash report everyone here gets)
VLC Version: 3.0.12
nvidia driver: 460.39-8
kernel: 5.10.16-arch1-1
GPU: GeForce GTX 1060 6GB

The video in question has this properties:

Format : Matroska
Format version : Version 4
File size : 215 MiB
Duration : 23 min 38 s
Overall bit rate : 1 270 kb/s
Writing application : Lavf58.65.101
Writing library : Lavf58.65.101
ErrorDetectionType : Per level 1
ENCODER_SETTINGS : Redacted, CPU - 2xAMD EPYC 7282 32 cores/64 threads, GPU - None used

Video
ID : 1
Format : HEVC
Format/Info : High Efficiency Video Coding
Format profile : Main@L4.1@Main
Codec ID : V_MPEGH/ISO/HEVC
Duration : 23 min 38 s
Bit rate : 8 244 kb/s
Width : 1 920 pixels
Height : 1 080 pixels
Display aspect ratio : 16:9
Frame rate mode : Variable
Frame rate : 25.674 FPS
Original frame rate : 59.940 (60000/1001) FPS
Color space : YUV
Chroma subsampling : 4:2:0
Bit depth : 8 bits
Bits/(Pixel*Frame) : 0.155
Stream size : 1.36 GiB
Writing library : Lavc58.115.102 libx265
Language : Japanese
Default : Yes
Forced : No
Color range : Limited
Color primaries : BT.709
Transfer characteristics : BT.709
Matrix coefficients : BT.709

Audio
ID : 2
Format : AAC LC
Format/Info : Advanced Audio Codec Low Complexity
Codec ID : A_AAC-2
Duration : 23 min 38 s
Bit rate : 253 kb/s
Channel(s) : 2 channels
Channel layout : L R
Sampling rate : 48.0 kHz
Frame rate : 46.875 FPS (1024 SPF)
Compression mode : Lossy
Delay relative to video : -355 ms
Stream size : 42.7 MiB (20%)
Writing library : Lavc58.115.102 aac
Language : Japanese
Default : Yes
Forced : No

Text
ID : 3
Format : ASS
Codec ID : S_TEXT/ASS
Codec ID/Info : Advanced Sub Station Alpha
Duration : 23 min 38 s
Bit rate : 100 b/s
Count of elements : 301
Compression mode : Lossless
Stream size : 17.3 KiB (0%)
Writing library : Lavc58.115.102 ssa
Language : English
Default : Yes
Forced : No

1 Like

I’ve been experiencing the same bug as the above for months as well, occurring several times a week if not daily. The only thing that spared my sanity was that I’ve spent the last few months mostly booted up in my Windows partition. This also means that I can’t pinpoint exactly when it started for me, but it likely coincides with the others, either 455 or 450.

The most recent occurrence happened while I had most of my apps closed, playing a timed round in Geoguessr in Chromium. The time before that was while running a JS user script in Chromium. Before that, every case I can remember was with an abundance of tabs open in Chromium so I had assumed it was an OOM issue, but in retrospect, it might just have increased the chance of the bug happening, such as having multiple YouTube tabs playing.

I haven’t read the entire discussion thus far. Is the common thread here that Chromium triggers the bug? Would switching to a non-Chromium browser be a workaround for this in the interim? It sounds like the different flags and GPU settings passed to Chromium aren’t effective.

kernel: 5.10.16-arch1-1
nvidia driver: 460.39
GPU: GTX 1060

As an aside, I’d like to +1 the sentiment that this is an embarrassment. I thought maybe nvidia was finally getting their stuff together with getting the work necessary to support Wayland done, but instead I’m dealing with this. I’ve been running Arch for about a decade, back when it was still known as being unstable, but aside from a few instances of configuration issues on my part, all of stability issues I’ve encountered were nvidia related. This might be the third time I have to pin my nvidia/kernel version to avoid a problem, and the last time this happened it took over half a year for the issue to get resolved. Even the registration process for this site is a disgrace.

1 Like

No, I have seen exactly the same problem in Firefox. Chromium may be either more popular, or it uses GPU for more of its functionality, however crashes happen on systems that never run Chromium.

2 Likes

That has been my experience too. Been using Arch for a couple of years and the majority of issues I’ve had have been bad drivers from NVIDIA.

1 Like

Have to admin, since running Arch, apart from this particular issue, haven’t had any other issues with Nvidia.

Nvidia, in general, has been great on Linux, less of a pain than AMD/ATI.

Uptime has been 11 days so far.

Turned off “Use hardware acceleration when available” in chrome/chromium.

There are two kinds of Linux kernel in (K)Ubuntu: generic and lowlatency. I’m running the latter, and it has CONFIG_PREEMPT=y it its config.

However, I can’t say I’m seeing this bug often. I had it once or twice.

Thanks for clarifying, Lastique. So this might be a clue after all.

1 Like

I had this bug too a couple of times and I also have preemption enabled. It’s hard to reproduce for sure but the stack trace should really help the NVIDIA devs. Even if it’s just a workaround, would be still better than this crash killing the driver completely. I can still SSH to my PC but there’s no way to reload the driver or restart Xorg (which is stuck in D-state IIRC and by definition can’t be killed).

I use Debian Testing and I compile the kernel myself. I know it’s not a good thing when we’re talking about reproducing bugs but I only slightly change the default Debian config. I set the system timer to 1000 Hz and set CONFIG_PREEMPT to y so the latency is the lowest possible. I mainly do this so that PulseAudio doesn’t skip, it was an issue years ago, not sure if it’s still like that but I’m used to updating my kernel myself and there were no issues related to it. Plus, I’m not alone with this bug and the stack trace was identical to all those already posted here. For the sake of completeness I’ll also post mine:

Jan 29 23:41:37 homecomp kernel: [634108.784676] BUG: kernel NULL pointer dereference, address: 0000000000000008
Jan 29 23:41:37 homecomp kernel: [634108.784679] #PF: supervisor read access in kernel mode
Jan 29 23:41:37 homecomp kernel: [634108.784680] #PF: error_code(0x0000) - not-present page
Jan 29 23:41:37 homecomp kernel: [634108.784681] PGD 8000000113c66067 P4D 8000000113c66067 PUD 0 
Jan 29 23:41:37 homecomp kernel: [634108.784684] Oops: 0000 [#1] PREEMPT SMP PTI
Jan 29 23:41:37 homecomp kernel: [634108.784685] CPU: 5 PID: 1422109 Comm: irq/136-nvidia Tainted: P           OE     5.10.4-rkfg #26
Jan 29 23:41:37 homecomp kernel: [634108.784686] Hardware name: Gigabyte Technology Co., Ltd. Z270P-D3/Z270P-D3-CF, BIOS F3 02/10/2017
Jan 29 23:41:37 homecomp kernel: [634108.784825] RIP: 0010:_nv013013rm+0xd8/0x130 [nvidia]
Jan 29 23:41:37 homecomp kernel: [634108.784827] Code: 40 00 31 c0 5b 41 5c 41 5d c3 0f 1f 84 00 00 00 00 00 48 c7 46 38 00 00 00 00 48 89 f7 45 31 ed e8 dd 7a ff ff eb 9b 0f 1f 00 <49> 8b 7c 24 08 e8 de 29 00 00 48 85 c0 74 b4 49 83 7c 24 08 00 74
Jan 29 23:41:37 homecomp kernel: [634108.784828] RSP: 0018:ffffbbc08dca7c40 EFLAGS: 00010246
Jan 29 23:41:37 homecomp kernel: [634108.784829] RAX: 0000000000000001 RBX: ffff9908e393ab88 RCX: 0000000000000010
Jan 29 23:41:37 homecomp kernel: [634108.784830] RDX: ffff9908484fefd8 RSI: 00000000004789f2 RDI: ffff9908618400f0
Jan 29 23:41:37 homecomp kernel: [634108.784830] RBP: ffff9908e393ab10 R08: ffffffffc571fa80 R09: ffff9908e393aa60
Jan 29 23:41:37 homecomp kernel: [634108.784831] R10: ffff99087b35c008 R11: ffff99087b35d098 R12: 0000000000000000
Jan 29 23:41:37 homecomp kernel: [634108.784832] R13: 0000000000000001 R14: 00000000beef0003 R15: ffff9908e393abd0
Jan 29 23:41:37 homecomp kernel: [634108.784833] FS:  0000000000000000(0000) GS:ffff990f4ed40000(0000) knlGS:0000000000000000
Jan 29 23:41:37 homecomp kernel: [634108.784833] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jan 29 23:41:37 homecomp kernel: [634108.784834] CR2: 0000000000000008 CR3: 00000001d2b68003 CR4: 00000000003706e0
Jan 29 23:41:37 homecomp kernel: [634108.784835] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Jan 29 23:41:37 homecomp kernel: [634108.784835] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Jan 29 23:41:37 homecomp kernel: [634108.784836] Call Trace:
Jan 29 23:41:37 homecomp kernel: [634108.784952]  ? _nv000082rm+0x16c/0x1e0 [nvidia]
Jan 29 23:41:37 homecomp kernel: [634108.785108]  ? _nv012946rm+0xff/0x180 [nvidia]
Jan 29 23:41:37 homecomp kernel: [634108.785252]  ? _nv019582rm+0x1af/0x210 [nvidia]
Jan 29 23:41:37 homecomp kernel: [634108.785400]  ? _nv019533rm+0xdf2/0xef0 [nvidia]
Jan 29 23:41:37 homecomp kernel: [634108.785547]  ? _nv019534rm+0xf3/0x290 [nvidia]
Jan 29 23:41:37 homecomp kernel: [634108.785694]  ? _nv019500rm+0x78/0xd0 [nvidia]
Jan 29 23:41:37 homecomp kernel: [634108.785841]  ? _nv019514rm+0xcf/0x2f0 [nvidia]
Jan 29 23:41:37 homecomp kernel: [634108.785988]  ? _nv019548rm+0xbe/0xe0 [nvidia]
Jan 29 23:41:37 homecomp kernel: [634108.786140]  ? _nv028760rm+0x97b/0xdc0 [nvidia]
Jan 29 23:41:37 homecomp kernel: [634108.786293]  ? _nv028768rm+0x15d/0x400 [nvidia]
Jan 29 23:41:37 homecomp kernel: [634108.786391]  ? _nv000710rm+0xa9/0x240 [nvidia]
Jan 29 23:41:37 homecomp kernel: [634108.786394]  ? disable_irq_nosync+0x10/0x10
Jan 29 23:41:37 homecomp kernel: [634108.786495]  ? rm_isr_bh+0x1c/0x60 [nvidia]
Jan 29 23:41:37 homecomp kernel: [634108.786547]  ? nvidia_isr_kthread_bh+0x1b/0x40 [nvidia]
Jan 29 23:41:37 homecomp kernel: [634108.786548]  ? irq_thread_fn+0x20/0x60
Jan 29 23:41:37 homecomp kernel: [634108.786549]  ? irq_thread+0xe3/0x190
Jan 29 23:41:37 homecomp kernel: [634108.786550]  ? irq_finalize_oneshot.part.0+0xf0/0xf0
Jan 29 23:41:37 homecomp kernel: [634108.786552]  ? irq_thread_check_affinity+0xc0/0xc0
Jan 29 23:41:37 homecomp kernel: [634108.786553]  ? kthread+0x142/0x160
Jan 29 23:41:37 homecomp kernel: [634108.786554]  ? __kthread_bind_mask+0x60/0x60
Jan 29 23:41:37 homecomp kernel: [634108.786556]  ? ret_from_fork+0x22/0x30
Jan 29 23:41:37 homecomp kernel: [634108.786557] Modules linked in: vfat(E) msdos(E) fat(E) cpuid(E) nvidia_drm(POE) nvidia_modeset(POE) nvidia(POE) bluetooth(E) jitterentropy_rng(E) drbg(E) ansi_cprng(E) ecdh_generic(E) ecc(E) rfkill(E) sctp(E) vhost_net(E) vhost(E) vhost_iotlb(E) tap(E) dm_crypt(E) dm_mod(E) veth(E) rpcsec_gss_krb5(E) nfsv4(E) dns_resolver(E) nfs(E) nfs_ssc(E) fscache(E) overlay(E) bridge(E) stp(E) llc(E) tun(E) uinput(E) xt_owner(E) xt_TCPMSS(E) nft_counter(E) nft_chain_nat(E) xt_MASQUERADE(E) xt_mark(E) xt_nat(E) nf_nat(E) nf_conntrack(E) nf_defrag_ipv6(E) nf_defrag_ipv4(E) xt_tcpudp(E) nft_compat(E) nf_tables(E) nfnetlink(E) binfmt_misc(E) intel_rapl_msr(E) intel_rapl_common(E) x86_pkg_temp_thermal(E) intel_powerclamp(E) snd_hda_codec_realtek(E) snd_hda_codec_generic(E) ledtrig_audio(E) kvm_intel(E) snd_hda_codec_hdmi(E) kvm(E) snd_hda_intel(E) irqbypass(E) snd_intel_dspcfg(E) crc32_pclmul(E) snd_hda_codec(E) snd_hda_core(E) ghash_clmulni_intel(E) snd_hwdep(E) snd_pcm_oss(E) aesni_intel(E)
Jan 29 23:41:37 homecomp kernel: [634108.786584]  snd_mixer_oss(E) libaes(E) crypto_simd(E) snd_pcm(E) cryptd(E) snd_seq_midi(E) glue_helper(E) snd_seq_midi_event(E) rapl(E) snd_rawm
idi(E) intel_cstate(E) snd_seq(E) intel_uncore(E) snd_seq_device(E) snd_timer(E) iTCO_wdt(E) iTCO_vendor_support(E) watchdog(E) evdev(E) joydev(E) drm_kms_helper(E) snd(E) cec(E) sou
ndcore(E) fb_sys_fops(E) syscopyarea(E) mei_me(E) sysfillrect(E) sysimgblt(E) sg(E) mei(E) acpi_pad(E) button(E) sch_fq(E) tcp_yeah(E) tcp_vegas(E) v4l2loopback(OE) videodev(E) mc(E)
 lm75(E) regmap_i2c(E) coretemp(E) ecryptfs(E) parport_pc(E) ppdev(E) nfsd(E) lp(E) parport(E) auth_rpcgss(E) nfs_acl(E) lockd(E) grace(E) drm(E) fuse(E) configfs(E) sunrpc(E) ip_tab
les(E) x_tables(E) autofs4(E) ext4(E) crc16(E) mbcache(E) jbd2(E) raid10(E) raid456(E) async_raid6_recov(E) async_memcpy(E) async_pq(E) async_xor(E) xor(E) async_tx(E) raid6_pq(E) li
bcrc32c(E) crc32c_generic(E) raid1(E) raid0(E) multipath(E) linear(E) md_mod(E) hid_generic(E) usbhid(E) hid(E) sd_mod(E) t10_pi(E)
Jan 29 23:41:37 homecomp kernel: [634108.786616]  crc_t10dif(E) crct10dif_generic(E) ahci(E) xhci_pci(E) libahci(E) r8169(E) xhci_hcd(E) realtek(E) mdio_devres(E) libata(E) crct10dif
_pclmul(E) crct10dif_common(E) i2c_i801(E) crc32c_intel(E) i2c_smbus(E) libphy(E) usbcore(E) scsi_mod(E) fan(E) video(E) [last unloaded: nvidia]
Jan 29 23:41:37 homecomp kernel: [634108.786625] CR2: 0000000000000008
Jan 29 23:41:37 homecomp kernel: [634108.786627] ---[ end trace 433531edb2a930b9 ]---
Jan 29 23:41:37 homecomp kernel: [634108.786741] RIP: 0010:_nv013013rm+0xd8/0x130 [nvidia]
Jan 29 23:41:37 homecomp kernel: [634108.786742] Code: 40 00 31 c0 5b 41 5c 41 5d c3 0f 1f 84 00 00 00 00 00 48 c7 46 38 00 00 00 00 48 89 f7 45 31 ed e8 dd 7a ff ff eb 9b 0f 1f 00 <
49> 8b 7c 24 08 e8 de 29 00 00 48 85 c0 74 b4 49 83 7c 24 08 00 74
Jan 29 23:41:37 homecomp kernel: [634108.786743] RSP: 0018:ffffbbc08dca7c40 EFLAGS: 00010246
Jan 29 23:41:37 homecomp kernel: [634108.786744] RAX: 0000000000000001 RBX: ffff9908e393ab88 RCX: 0000000000000010
Jan 29 23:41:37 homecomp kernel: [634108.786745] RDX: ffff9908484fefd8 RSI: 00000000004789f2 RDI: ffff9908618400f0
Jan 29 23:41:37 homecomp kernel: [634108.786745] RBP: ffff9908e393ab10 R08: ffffffffc571fa80 R09: ffff9908e393aa60
Jan 29 23:41:37 homecomp kernel: [634108.786746] R10: ffff99087b35c008 R11: ffff99087b35d098 R12: 0000000000000000
Jan 29 23:41:37 homecomp kernel: [634108.786747] R13: 0000000000000001 R14: 00000000beef0003 R15: ffff9908e393abd0
Jan 29 23:41:37 homecomp kernel: [634108.786747] FS:  0000000000000000(0000) GS:ffff990f4ed40000(0000) knlGS:0000000000000000
Jan 29 23:41:37 homecomp kernel: [634108.786748] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jan 29 23:41:37 homecomp kernel: [634108.786749] CR2: 0000000000000008 CR3: 00000001d2b68003 CR4: 00000000003706e0
Jan 29 23:41:37 homecomp kernel: [634108.786750] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Jan 29 23:41:37 homecomp kernel: [634108.786750] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Jan 29 23:41:37 homecomp kernel: [634108.786765] BUG: kernel NULL pointer dereference, address: 0000000000000001
Jan 29 23:41:37 homecomp kernel: [634108.786768] #PF: supervisor instruction fetch in kernel mode
Jan 29 23:41:37 homecomp kernel: [634108.786770] #PF: error_code(0x0010) - not-present page
Jan 29 23:41:37 homecomp kernel: [634108.786773] PGD 8000000113c66067 P4D 8000000113c66067 PUD 0 
Jan 29 23:41:37 homecomp kernel: [634108.786779] Oops: 0010 [#2] PREEMPT SMP PTI
Jan 29 23:41:37 homecomp kernel: [634108.786782] CPU: 5 PID: 1422109 Comm: irq/136-nvidia Tainted: P      D    OE     5.10.4-rkfg #26
Jan 29 23:41:37 homecomp kernel: [634108.786784] Hardware name: Gigabyte Technology Co., Ltd. Z270P-D3/Z270P-D3-CF, BIOS F3 02/10/2017
Jan 29 23:41:37 homecomp kernel: [634108.786786] RIP: 0010:0x1
Jan 29 23:41:37 homecomp kernel: [634108.786820] Code: Unable to access opcode bytes at RIP 0xffffffffffffffd7.
Jan 29 23:41:37 homecomp kernel: [634108.786822] RSP: 0018:ffffbbc08dca7eb8 EFLAGS: 00010286
Jan 29 23:41:37 homecomp kernel: [634108.786826] RAX: 0000000000000001 RBX: 0000000000000000 RCX: 0000000000000000
Jan 29 23:41:37 homecomp kernel: [634108.786828] RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffffbbc08dca7ec8
Jan 29 23:41:37 homecomp kernel: [634108.786830] RBP: ffff990d017a2100 R08: 0000000000000046 R09: ffffbbc08dca7930
Jan 29 23:41:37 homecomp kernel: [634108.786832] R10: ffffbbc08dca7928 R11: ffffffffb64ae8b0 R12: ffff990d017a28fc
Jan 29 23:41:37 homecomp kernel: [634108.786833] R13: 0000000000000000 R14: 0000000000000001 R15: ffff990d017a2100
Jan 29 23:41:37 homecomp kernel: [634108.786862] FS:  0000000000000000(0000) GS:ffff990f4ed40000(0000) knlGS:0000000000000000
Jan 29 23:41:37 homecomp kernel: [634108.786863] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jan 29 23:41:37 homecomp kernel: [634108.786864] CR2: ffffffffffffffd7 CR3: 00000001d2b68003 CR4: 00000000003706e0
Jan 29 23:41:37 homecomp kernel: [634108.786864] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Jan 29 23:41:37 homecomp kernel: [634108.786865] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Jan 29 23:41:37 homecomp kernel: [634108.786866] Call Trace:
Jan 29 23:41:37 homecomp kernel: [634108.786868]  ? task_work_run+0x5c/0x90
Jan 29 23:41:37 homecomp kernel: [634108.786870]  ? do_exit+0x333/0xab0
Jan 29 23:41:37 homecomp kernel: [634108.786871]  ? irq_thread_check_affinity+0xc0/0xc0
Jan 29 23:41:37 homecomp kernel: [634108.786872]  ? kthread+0x142/0x160
Jan 29 23:41:37 homecomp kernel: [634108.786873]  ? rewind_stack_do_exit+0x17/0x17
Jan 29 23:41:37 homecomp kernel: [634108.786875] Modules linked in: vfat(E) msdos(E) fat(E) cpuid(E) nvidia_drm(POE) nvidia_modeset(POE) nvidia(POE) bluetooth(E) jitterentropy_rng(E) drbg(E) ansi_cprng(E) ecdh_generic(E) ecc(E) rfkill(E) sctp(E) vhost_net(E) vhost(E) vhost_iotlb(E) tap(E) dm_crypt(E) dm_mod(E) veth(E) rpcsec_gss_krb5(E) nfsv4(E) dns_resolver(E) nfs(E) nfs_ssc(E) fscache(E) overlay(E) bridge(E) stp(E) llc(E) tun(E) uinput(E) xt_owner(E) xt_TCPMSS(E) nft_counter(E) nft_chain_nat(E) xt_MASQUERADE(E) xt_mark(E) xt_nat(E) nf_nat(E) nf_conntrack(E) nf_defrag_ipv6(E) nf_defrag_ipv4(E) xt_tcpudp(E) nft_compat(E) nf_tables(E) nfnetlink(E) binfmt_misc(E) intel_rapl_msr(E) intel_rapl_common(E) x86_pkg_temp_thermal(E) intel_powerclamp(E) snd_hda_codec_realtek(E) snd_hda_codec_generic(E) ledtrig_audio(E) kvm_intel(E) snd_hda_codec_hdmi(E) kvm(E) snd_hda_intel(E) irqbypass(E) snd_intel_dspcfg(E) crc32_pclmul(E) snd_hda_codec(E) snd_hda_core(E) ghash_clmulni_intel(E) snd_hwdep(E) snd_pcm_oss(E) aesni_intel(E)
Jan 29 23:41:37 homecomp kernel: [634108.786898]  snd_mixer_oss(E) libaes(E) crypto_simd(E) snd_pcm(E) cryptd(E) snd_seq_midi(E) glue_helper(E) snd_seq_midi_event(E) rapl(E) snd_rawmidi(E) intel_cstate(E) snd_seq(E) intel_uncore(E) snd_seq_device(E) snd_timer(E) iTCO_wdt(E) iTCO_vendor_support(E) watchdog(E) evdev(E) joydev(E) drm_kms_helper(E) snd(E) cec(E) soundcore(E) fb_sys_fops(E) syscopyarea(E) mei_me(E) sysfillrect(E) sysimgblt(E) sg(E) mei(E) acpi_pad(E) button(E) sch_fq(E) tcp_yeah(E) tcp_vegas(E) v4l2loopback(OE) videodev(E) mc(E) lm75(E) regmap_i2c(E) coretemp(E) ecryptfs(E) parport_pc(E) ppdev(E) nfsd(E) lp(E) parport(E) auth_rpcgss(E) nfs_acl(E) lockd(E) grace(E) drm(E) fuse(E) configfs(E) sunrpc(E) ip_tables(E) x_tables(E) autofs4(E) ext4(E) crc16(E) mbcache(E) jbd2(E) raid10(E) raid456(E) async_raid6_recov(E) async_memcpy(E) async_pq(E) async_xor(E) xor(E) async_tx(E) raid6_pq(E) libcrc32c(E) crc32c_generic(E) raid1(E) raid0(E) multipath(E) linear(E) md_mod(E) hid_generic(E) usbhid(E) hid(E) sd_mod(E) t10_pi(E)
Jan 29 23:41:37 homecomp kernel: [634108.786928]  crc_t10dif(E) crct10dif_generic(E) ahci(E) xhci_pci(E) libahci(E) r8169(E) xhci_hcd(E) realtek(E) mdio_devres(E) libata(E) crct10dif_pclmul(E) crct10dif_common(E) i2c_i801(E) crc32c_intel(E) i2c_smbus(E) libphy(E) usbcore(E) scsi_mod(E) fan(E) video(E) [last unloaded: nvidia]
Jan 29 23:41:37 homecomp kernel: [634108.786937] CR2: 0000000000000001
Jan 29 23:41:37 homecomp kernel: [634108.786940] ---[ end trace 433531edb2a930ba ]---
Jan 29 23:41:37 homecomp kernel: [634108.787074] RIP: 0010:_nv013013rm+0xd8/0x130 [nvidia]
Jan 29 23:41:37 homecomp kernel: [634108.787077] Code: 40 00 31 c0 5b 41 5c 41 5d c3 0f 1f 84 00 00 00 00 00 48 c7 46 38 00 00 00 00 48 89 f7 45 31 ed e8 dd 7a ff ff eb 9b 0f 1f 00 <49> 8b 7c 24 08 e8 de 29 00 00 48 85 c0 74 b4 49 83 7c 24 08 00 74
Jan 29 23:41:37 homecomp kernel: [634108.787079] RSP: 0018:ffffbbc08dca7c40 EFLAGS: 00010246
Jan 29 23:41:37 homecomp kernel: [634108.787080] RAX: 0000000000000001 RBX: ffff9908e393ab88 RCX: 0000000000000010
Jan 29 23:41:37 homecomp kernel: [634108.787081] RDX: ffff9908484fefd8 RSI: 00000000004789f2 RDI: ffff9908618400f0
Jan 29 23:41:37 homecomp kernel: [634108.787082] RBP: ffff9908e393ab10 R08: ffffffffc571fa80 R09: ffff9908e393aa60
Jan 29 23:41:37 homecomp kernel: [634108.787082] R10: ffff99087b35c008 R11: ffff99087b35d098 R12: 0000000000000000
Jan 29 23:41:37 homecomp kernel: [634108.787083] R13: 0000000000000001 R14: 00000000beef0003 R15: ffff9908e393abd0
Jan 29 23:41:37 homecomp kernel: [634108.787084] FS:  0000000000000000(0000) GS:ffff990f4ed40000(0000) knlGS:0000000000000000
Jan 29 23:41:37 homecomp kernel: [634108.787084] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jan 29 23:41:37 homecomp kernel: [634108.787085] CR2: ffffffffffffffd7 CR3: 00000001d2b68003 CR4: 00000000003706e0
Jan 29 23:41:37 homecomp kernel: [634108.787086] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Jan 29 23:41:37 homecomp kernel: [634108.787086] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Jan 29 23:41:37 homecomp kernel: [634108.787087] Fixing recursive fault but reboot is needed!

It happened almost a month ago when I was just browsing web and upon switching a tab or something like that Xorg completely froze, mouse cursor stopped moving, keyboard didn’t respond etc. As I said the system itself was alive but unusable. The driver version I used at the moment was 460.39.

Thank you. For those who missed it, @generix has suggested that kernel preemption might be a necessary factor in this issue presenting itself. @mangamaniac1 since you are able to reproduce the bug (I cannot; I don’t get it very often) are you able to use a kernel that doesn’t have preemption, and report back whether the crashes still happen?

The crash has happened when I was watching a video on VLC.

distribution: ArchLinux
driver: nvidia-460.39
kernel: 5.10.16-zen1-1-zen
GPU: GeForce GTX 970 (GM204-A)
CPU: Intel(R) Core™ i7-6700K CPU @ 4.00GHz

I have managed to ssh to the console from a second machine and collect the logs using the following command: sudo nvidia-bug-report.sh --safe-mode --extra-system-data.

The original version of the script has hung so I have used a version posted by @wamogo5042

I hope you will find something useful in these logs.
nvidia-bug-report.log.gz (134.1 KB)

@kamiox
I think you forgot to attach the log to your message.

1 Like

This is not a nVidia issue.
This is not a nVidia issue.
This is not a nVidia issue.

a confluence of displayport,xhci, renesas, browser, config, integration issues

This isnt about fixing your specific issue rather this entire thread.

I’ve just read every post, bug report and log extract on this thread.
This is a super easy fix.
Firstly to clarify;
Linux is not supposed to work out of box.
Thats a Closed Source Market Standard.
The Open Source End User is “FREE”;
to “finish” the Open Source product to a Closed Source Market definition of “state of finish”

So the near total majority of posts are:

  • nVidia Driver 455.
  • Arch Linux,
  • Kernel 5.8+ to 5.9.1 Lowlatency / Pre-emptive
  • Intel LGA 1155, 1151, 1151v2, 1150
  • kernel NULL pointer dereference, address: 0000000000000020
  • kernel NULL pointer dereference, address: 0000000000000027
  • Chromium and Chromium based browsers.Chrome, Opera, Falkon
  • Firefox

“Extract from Chromium ArchLinux Wiki”

Hardware video acceleration

  • There is no official support from Chromium or Arch Linux for this feature (Chromium Docs - VA-API), but you may ask for help in the dedicated forum thread.

  • chromium from official repositories is compiled with VA-API support.

  • For proprietary NVIDIA support, installing libva-vdpau-driver-chromiumAUR or libva-vdpau-driver-vp9-gitAUR is required.

  • Wayland is not supported.

  • To use VA-API on XWayland, use the --use-gl=egl flag. Currently exhibits choppiness FS#67035. It could be solved by enabling #Native Wayland support.

  • To use VA-API on Xorg, use the --use-gl=desktop flag.

  • Starting in Chromium 86, there will be support for VA-API when using the ANGLE gl renderer. Use the --enable-accelerated-video-decode to enable it on an Intel GPU."

BTW, Hows ARCH working out for ya!?

If the your system isn’t configured and integrated as per / The Book and the above Browsers aren’t dury rigged with workarounds then this WILL exponentially exacerbate and exploit the poor system integration, configuration coupled with the lack of support in the kernel or other such issues.

Correct Bios Settings are critical.
Correct Kernel parameters are critical

I also saw multiple posts using PowerSave aswell in some form.
This affects the nVidia driver aswell. It wants to ramp up and is getting choked.
Disbale all power management for PCI express.

The biggest issue is BIOS DMA Buffer/ VM / IOMMU and xHCI settings and support.
Is xHCI handover still enabled in the BIOS?
USB 2/ PS/2 Legacy support uses VM is the BIOS.

Kernel 5.8
In Arch Linux and Manjaro 5.8+ kernel has issues with Renesas USB controllers due to a FW version check issue.

Kernel EDID patch: 20201203
Removed the Skylake/Kabylake platform detection logic and makes the edid function work on all platforms. Regardless, with the patch, a kernel oops occurs on the function intel_vgpu_reg_rw_edid in drivers/drm/i915/kvmgt.c.

outtatime