Bug report: 455.23.04 - Kernel Panic due to NULL pointer dereference

I have driver 440. I updated to the latest after 5 months as I thought they would have fixed it by now since the new kernel update. They clearly haven’t.

Seriously, I’m starting to think the people who are still defending nvidia are actually paid shills. What kind of normal person with a functional brain and sees what is actually happening is still defending them?

If nvidia REALLY wanted “bug reports” as they keep saying they do. HINT!!!

THEY DON’T
THEY DON’T
THEY DON’T
THEY DON’T
THEY DON’T
THEY DON’T
THEY DON’T
THEY DON’T
THEY DON’T
THEY DON’T
THEY DON’T
THEY DON’T
THEY DON’T
THEY DON’T
THEY DON’T
THEY DON’T
THEY DON’T
THEY DON’T
THEY DON’T
THEY DON’T
THEY DON’T

THEY JUST WANT TO SHUT US UP.

The global market for PCs total over 1 billion PCs. (Actually way more but I’ll give chuck nvidia another bone, dumb ik)

Linux market share is around 3% (https://www.debugpoint.com/2020/07/linux-desktop-market-share-peaked-to-all-time-high-in-june/)

Also probably way more as Linux users are not the type to share this sort of stuff and participate in surveys.

https://store.steampowered.com/hwsurvey/videocard/
Nvidia has about 75% of GPUs on steam. I’ll again go for a lower number: 70%.

1000000000*0.03
            30000000
 . * .7
            21000000

That is 20M++ users on the LOW END!!!

Do you know how they can get bug reports?

Put a warning message on install.

1 LINE! Here I’ll help

echo "Driver 440+ are buggy and causes crashes, please report this to the forum if you see anything. Not that we are gonna do anything lmao dumbass" >> $install_script

Or send out a email.

There is 20M+ of us affected by this bug. Even if just .00001 of users reported this would be 200 bug reports!!! And lets just say A LOT MORE than .00001 of people would be angry if they knew nvidia was causing this.

At the price of $100 USD a GPU that is 2 BILLION THEY HAVE TAKEN FROM US. (also far too low)

THERE IS ZERO EXCUSE. THE TERRIBLE LINUX SUPPORT AND NOW THIS!

The only reason I can think of why nvidia hasn’t fixed it is that they either:

  • just don’t care
  • bad PR if they actually admitted they messed up
1 Like

Looking at all the bug reports, where this is happening in the kernel and systems where this bug doesn’t occur, I get the feeling the problem might be kernel preemption. Ubuntu and Fedora, e.g., don’t use preemptible kernels. Does Arch provide non-preemptible kernel builds to check?

1 Like

This user reports experiencing the bug on Kubuntu though.

1 Like

Ok, I missed that. So I guess it’s back to the drawing board.

1 Like

I have seen certain posts on this thread describe how running chromium with “–use-gl=desktop” has protected against kernel bugs. I personally have found this method to work for months until yesterday while in a video call playing GeoGuessr. I had another crash similar to the one I posted about before, even though I had “–use-gl=desktop” as one of my chromium flags at the time.

What could be related though was my resetting my chromium settings that afternoon due to an unrelated issue. I’m pretty sure I restored all the previous settings I had before the reset (and my flags were untouched), but funnily enough, the YouTube playback issues I experienced before with “–use-gl=desktop” (which I mentioned in my previous post and more in depth here) also disappeared. So either, something in that reset caused chromium to ignore my “–use-gl=desktop” flag, or a series of other changes to chromium/nvidia that day (despite no package updates in almost a week) caused my --use-gl=desktop problems to disappear and for the kernel bug to reappear.

I’m just very confused right now, so I was wondering whether any of the other users in this chain who have been using “–use-gl=desktop” experienced anything similar.

Journal logs:
crash.txt (12.0 KB)

1 Like

Hello!

I have now reproduced this crash three times in a row while playing a video in VLC
crash.txt (10.9 KB)
(txt contains the common journalctl crash report everyone here gets)
VLC Version: 3.0.12
nvidia driver: 460.39-8
kernel: 5.10.16-arch1-1
GPU: GeForce GTX 1060 6GB

The video in question has this properties:

Format : Matroska
Format version : Version 4
File size : 215 MiB
Duration : 23 min 38 s
Overall bit rate : 1 270 kb/s
Writing application : Lavf58.65.101
Writing library : Lavf58.65.101
ErrorDetectionType : Per level 1
ENCODER_SETTINGS : Redacted, CPU - 2xAMD EPYC 7282 32 cores/64 threads, GPU - None used

Video
ID : 1
Format : HEVC
Format/Info : High Efficiency Video Coding
Format profile : Main@L4.1@Main
Codec ID : V_MPEGH/ISO/HEVC
Duration : 23 min 38 s
Bit rate : 8 244 kb/s
Width : 1 920 pixels
Height : 1 080 pixels
Display aspect ratio : 16:9
Frame rate mode : Variable
Frame rate : 25.674 FPS
Original frame rate : 59.940 (60000/1001) FPS
Color space : YUV
Chroma subsampling : 4:2:0
Bit depth : 8 bits
Bits/(Pixel*Frame) : 0.155
Stream size : 1.36 GiB
Writing library : Lavc58.115.102 libx265
Language : Japanese
Default : Yes
Forced : No
Color range : Limited
Color primaries : BT.709
Transfer characteristics : BT.709
Matrix coefficients : BT.709

Audio
ID : 2
Format : AAC LC
Format/Info : Advanced Audio Codec Low Complexity
Codec ID : A_AAC-2
Duration : 23 min 38 s
Bit rate : 253 kb/s
Channel(s) : 2 channels
Channel layout : L R
Sampling rate : 48.0 kHz
Frame rate : 46.875 FPS (1024 SPF)
Compression mode : Lossy
Delay relative to video : -355 ms
Stream size : 42.7 MiB (20%)
Writing library : Lavc58.115.102 aac
Language : Japanese
Default : Yes
Forced : No

Text
ID : 3
Format : ASS
Codec ID : S_TEXT/ASS
Codec ID/Info : Advanced Sub Station Alpha
Duration : 23 min 38 s
Bit rate : 100 b/s
Count of elements : 301
Compression mode : Lossless
Stream size : 17.3 KiB (0%)
Writing library : Lavc58.115.102 ssa
Language : English
Default : Yes
Forced : No

1 Like

I’ve been experiencing the same bug as the above for months as well, occurring several times a week if not daily. The only thing that spared my sanity was that I’ve spent the last few months mostly booted up in my Windows partition. This also means that I can’t pinpoint exactly when it started for me, but it likely coincides with the others, either 455 or 450.

The most recent occurrence happened while I had most of my apps closed, playing a timed round in Geoguessr in Chromium. The time before that was while running a JS user script in Chromium. Before that, every case I can remember was with an abundance of tabs open in Chromium so I had assumed it was an OOM issue, but in retrospect, it might just have increased the chance of the bug happening, such as having multiple YouTube tabs playing.

I haven’t read the entire discussion thus far. Is the common thread here that Chromium triggers the bug? Would switching to a non-Chromium browser be a workaround for this in the interim? It sounds like the different flags and GPU settings passed to Chromium aren’t effective.

kernel: 5.10.16-arch1-1
nvidia driver: 460.39
GPU: GTX 1060

As an aside, I’d like to +1 the sentiment that this is an embarrassment. I thought maybe nvidia was finally getting their stuff together with getting the work necessary to support Wayland done, but instead I’m dealing with this. I’ve been running Arch for about a decade, back when it was still known as being unstable, but aside from a few instances of configuration issues on my part, all of stability issues I’ve encountered were nvidia related. This might be the third time I have to pin my nvidia/kernel version to avoid a problem, and the last time this happened it took over half a year for the issue to get resolved. Even the registration process for this site is a disgrace.

1 Like

No, I have seen exactly the same problem in Firefox. Chromium may be either more popular, or it uses GPU for more of its functionality, however crashes happen on systems that never run Chromium.

2 Likes

That has been my experience too. Been using Arch for a couple of years and the majority of issues I’ve had have been bad drivers from NVIDIA.

1 Like

Have to admin, since running Arch, apart from this particular issue, haven’t had any other issues with Nvidia.

Nvidia, in general, has been great on Linux, less of a pain than AMD/ATI.

Uptime has been 11 days so far.

Turned off “Use hardware acceleration when available” in chrome/chromium.

There are two kinds of Linux kernel in (K)Ubuntu: generic and lowlatency. I’m running the latter, and it has CONFIG_PREEMPT=y it its config.

However, I can’t say I’m seeing this bug often. I had it once or twice.

Thanks for clarifying, Lastique. So this might be a clue after all.

1 Like

I had this bug too a couple of times and I also have preemption enabled. It’s hard to reproduce for sure but the stack trace should really help the NVIDIA devs. Even if it’s just a workaround, would be still better than this crash killing the driver completely. I can still SSH to my PC but there’s no way to reload the driver or restart Xorg (which is stuck in D-state IIRC and by definition can’t be killed).

I use Debian Testing and I compile the kernel myself. I know it’s not a good thing when we’re talking about reproducing bugs but I only slightly change the default Debian config. I set the system timer to 1000 Hz and set CONFIG_PREEMPT to y so the latency is the lowest possible. I mainly do this so that PulseAudio doesn’t skip, it was an issue years ago, not sure if it’s still like that but I’m used to updating my kernel myself and there were no issues related to it. Plus, I’m not alone with this bug and the stack trace was identical to all those already posted here. For the sake of completeness I’ll also post mine:

Jan 29 23:41:37 homecomp kernel: [634108.784676] BUG: kernel NULL pointer dereference, address: 0000000000000008
Jan 29 23:41:37 homecomp kernel: [634108.784679] #PF: supervisor read access in kernel mode
Jan 29 23:41:37 homecomp kernel: [634108.784680] #PF: error_code(0x0000) - not-present page
Jan 29 23:41:37 homecomp kernel: [634108.784681] PGD 8000000113c66067 P4D 8000000113c66067 PUD 0 
Jan 29 23:41:37 homecomp kernel: [634108.784684] Oops: 0000 [#1] PREEMPT SMP PTI
Jan 29 23:41:37 homecomp kernel: [634108.784685] CPU: 5 PID: 1422109 Comm: irq/136-nvidia Tainted: P           OE     5.10.4-rkfg #26
Jan 29 23:41:37 homecomp kernel: [634108.784686] Hardware name: Gigabyte Technology Co., Ltd. Z270P-D3/Z270P-D3-CF, BIOS F3 02/10/2017
Jan 29 23:41:37 homecomp kernel: [634108.784825] RIP: 0010:_nv013013rm+0xd8/0x130 [nvidia]
Jan 29 23:41:37 homecomp kernel: [634108.784827] Code: 40 00 31 c0 5b 41 5c 41 5d c3 0f 1f 84 00 00 00 00 00 48 c7 46 38 00 00 00 00 48 89 f7 45 31 ed e8 dd 7a ff ff eb 9b 0f 1f 00 <49> 8b 7c 24 08 e8 de 29 00 00 48 85 c0 74 b4 49 83 7c 24 08 00 74
Jan 29 23:41:37 homecomp kernel: [634108.784828] RSP: 0018:ffffbbc08dca7c40 EFLAGS: 00010246
Jan 29 23:41:37 homecomp kernel: [634108.784829] RAX: 0000000000000001 RBX: ffff9908e393ab88 RCX: 0000000000000010
Jan 29 23:41:37 homecomp kernel: [634108.784830] RDX: ffff9908484fefd8 RSI: 00000000004789f2 RDI: ffff9908618400f0
Jan 29 23:41:37 homecomp kernel: [634108.784830] RBP: ffff9908e393ab10 R08: ffffffffc571fa80 R09: ffff9908e393aa60
Jan 29 23:41:37 homecomp kernel: [634108.784831] R10: ffff99087b35c008 R11: ffff99087b35d098 R12: 0000000000000000
Jan 29 23:41:37 homecomp kernel: [634108.784832] R13: 0000000000000001 R14: 00000000beef0003 R15: ffff9908e393abd0
Jan 29 23:41:37 homecomp kernel: [634108.784833] FS:  0000000000000000(0000) GS:ffff990f4ed40000(0000) knlGS:0000000000000000
Jan 29 23:41:37 homecomp kernel: [634108.784833] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jan 29 23:41:37 homecomp kernel: [634108.784834] CR2: 0000000000000008 CR3: 00000001d2b68003 CR4: 00000000003706e0
Jan 29 23:41:37 homecomp kernel: [634108.784835] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Jan 29 23:41:37 homecomp kernel: [634108.784835] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Jan 29 23:41:37 homecomp kernel: [634108.784836] Call Trace:
Jan 29 23:41:37 homecomp kernel: [634108.784952]  ? _nv000082rm+0x16c/0x1e0 [nvidia]
Jan 29 23:41:37 homecomp kernel: [634108.785108]  ? _nv012946rm+0xff/0x180 [nvidia]
Jan 29 23:41:37 homecomp kernel: [634108.785252]  ? _nv019582rm+0x1af/0x210 [nvidia]
Jan 29 23:41:37 homecomp kernel: [634108.785400]  ? _nv019533rm+0xdf2/0xef0 [nvidia]
Jan 29 23:41:37 homecomp kernel: [634108.785547]  ? _nv019534rm+0xf3/0x290 [nvidia]
Jan 29 23:41:37 homecomp kernel: [634108.785694]  ? _nv019500rm+0x78/0xd0 [nvidia]
Jan 29 23:41:37 homecomp kernel: [634108.785841]  ? _nv019514rm+0xcf/0x2f0 [nvidia]
Jan 29 23:41:37 homecomp kernel: [634108.785988]  ? _nv019548rm+0xbe/0xe0 [nvidia]
Jan 29 23:41:37 homecomp kernel: [634108.786140]  ? _nv028760rm+0x97b/0xdc0 [nvidia]
Jan 29 23:41:37 homecomp kernel: [634108.786293]  ? _nv028768rm+0x15d/0x400 [nvidia]
Jan 29 23:41:37 homecomp kernel: [634108.786391]  ? _nv000710rm+0xa9/0x240 [nvidia]
Jan 29 23:41:37 homecomp kernel: [634108.786394]  ? disable_irq_nosync+0x10/0x10
Jan 29 23:41:37 homecomp kernel: [634108.786495]  ? rm_isr_bh+0x1c/0x60 [nvidia]
Jan 29 23:41:37 homecomp kernel: [634108.786547]  ? nvidia_isr_kthread_bh+0x1b/0x40 [nvidia]
Jan 29 23:41:37 homecomp kernel: [634108.786548]  ? irq_thread_fn+0x20/0x60
Jan 29 23:41:37 homecomp kernel: [634108.786549]  ? irq_thread+0xe3/0x190
Jan 29 23:41:37 homecomp kernel: [634108.786550]  ? irq_finalize_oneshot.part.0+0xf0/0xf0
Jan 29 23:41:37 homecomp kernel: [634108.786552]  ? irq_thread_check_affinity+0xc0/0xc0
Jan 29 23:41:37 homecomp kernel: [634108.786553]  ? kthread+0x142/0x160
Jan 29 23:41:37 homecomp kernel: [634108.786554]  ? __kthread_bind_mask+0x60/0x60
Jan 29 23:41:37 homecomp kernel: [634108.786556]  ? ret_from_fork+0x22/0x30
Jan 29 23:41:37 homecomp kernel: [634108.786557] Modules linked in: vfat(E) msdos(E) fat(E) cpuid(E) nvidia_drm(POE) nvidia_modeset(POE) nvidia(POE) bluetooth(E) jitterentropy_rng(E) drbg(E) ansi_cprng(E) ecdh_generic(E) ecc(E) rfkill(E) sctp(E) vhost_net(E) vhost(E) vhost_iotlb(E) tap(E) dm_crypt(E) dm_mod(E) veth(E) rpcsec_gss_krb5(E) nfsv4(E) dns_resolver(E) nfs(E) nfs_ssc(E) fscache(E) overlay(E) bridge(E) stp(E) llc(E) tun(E) uinput(E) xt_owner(E) xt_TCPMSS(E) nft_counter(E) nft_chain_nat(E) xt_MASQUERADE(E) xt_mark(E) xt_nat(E) nf_nat(E) nf_conntrack(E) nf_defrag_ipv6(E) nf_defrag_ipv4(E) xt_tcpudp(E) nft_compat(E) nf_tables(E) nfnetlink(E) binfmt_misc(E) intel_rapl_msr(E) intel_rapl_common(E) x86_pkg_temp_thermal(E) intel_powerclamp(E) snd_hda_codec_realtek(E) snd_hda_codec_generic(E) ledtrig_audio(E) kvm_intel(E) snd_hda_codec_hdmi(E) kvm(E) snd_hda_intel(E) irqbypass(E) snd_intel_dspcfg(E) crc32_pclmul(E) snd_hda_codec(E) snd_hda_core(E) ghash_clmulni_intel(E) snd_hwdep(E) snd_pcm_oss(E) aesni_intel(E)
Jan 29 23:41:37 homecomp kernel: [634108.786584]  snd_mixer_oss(E) libaes(E) crypto_simd(E) snd_pcm(E) cryptd(E) snd_seq_midi(E) glue_helper(E) snd_seq_midi_event(E) rapl(E) snd_rawm
idi(E) intel_cstate(E) snd_seq(E) intel_uncore(E) snd_seq_device(E) snd_timer(E) iTCO_wdt(E) iTCO_vendor_support(E) watchdog(E) evdev(E) joydev(E) drm_kms_helper(E) snd(E) cec(E) sou
ndcore(E) fb_sys_fops(E) syscopyarea(E) mei_me(E) sysfillrect(E) sysimgblt(E) sg(E) mei(E) acpi_pad(E) button(E) sch_fq(E) tcp_yeah(E) tcp_vegas(E) v4l2loopback(OE) videodev(E) mc(E)
 lm75(E) regmap_i2c(E) coretemp(E) ecryptfs(E) parport_pc(E) ppdev(E) nfsd(E) lp(E) parport(E) auth_rpcgss(E) nfs_acl(E) lockd(E) grace(E) drm(E) fuse(E) configfs(E) sunrpc(E) ip_tab
les(E) x_tables(E) autofs4(E) ext4(E) crc16(E) mbcache(E) jbd2(E) raid10(E) raid456(E) async_raid6_recov(E) async_memcpy(E) async_pq(E) async_xor(E) xor(E) async_tx(E) raid6_pq(E) li
bcrc32c(E) crc32c_generic(E) raid1(E) raid0(E) multipath(E) linear(E) md_mod(E) hid_generic(E) usbhid(E) hid(E) sd_mod(E) t10_pi(E)
Jan 29 23:41:37 homecomp kernel: [634108.786616]  crc_t10dif(E) crct10dif_generic(E) ahci(E) xhci_pci(E) libahci(E) r8169(E) xhci_hcd(E) realtek(E) mdio_devres(E) libata(E) crct10dif
_pclmul(E) crct10dif_common(E) i2c_i801(E) crc32c_intel(E) i2c_smbus(E) libphy(E) usbcore(E) scsi_mod(E) fan(E) video(E) [last unloaded: nvidia]
Jan 29 23:41:37 homecomp kernel: [634108.786625] CR2: 0000000000000008
Jan 29 23:41:37 homecomp kernel: [634108.786627] ---[ end trace 433531edb2a930b9 ]---
Jan 29 23:41:37 homecomp kernel: [634108.786741] RIP: 0010:_nv013013rm+0xd8/0x130 [nvidia]
Jan 29 23:41:37 homecomp kernel: [634108.786742] Code: 40 00 31 c0 5b 41 5c 41 5d c3 0f 1f 84 00 00 00 00 00 48 c7 46 38 00 00 00 00 48 89 f7 45 31 ed e8 dd 7a ff ff eb 9b 0f 1f 00 <
49> 8b 7c 24 08 e8 de 29 00 00 48 85 c0 74 b4 49 83 7c 24 08 00 74
Jan 29 23:41:37 homecomp kernel: [634108.786743] RSP: 0018:ffffbbc08dca7c40 EFLAGS: 00010246
Jan 29 23:41:37 homecomp kernel: [634108.786744] RAX: 0000000000000001 RBX: ffff9908e393ab88 RCX: 0000000000000010
Jan 29 23:41:37 homecomp kernel: [634108.786745] RDX: ffff9908484fefd8 RSI: 00000000004789f2 RDI: ffff9908618400f0
Jan 29 23:41:37 homecomp kernel: [634108.786745] RBP: ffff9908e393ab10 R08: ffffffffc571fa80 R09: ffff9908e393aa60
Jan 29 23:41:37 homecomp kernel: [634108.786746] R10: ffff99087b35c008 R11: ffff99087b35d098 R12: 0000000000000000
Jan 29 23:41:37 homecomp kernel: [634108.786747] R13: 0000000000000001 R14: 00000000beef0003 R15: ffff9908e393abd0
Jan 29 23:41:37 homecomp kernel: [634108.786747] FS:  0000000000000000(0000) GS:ffff990f4ed40000(0000) knlGS:0000000000000000
Jan 29 23:41:37 homecomp kernel: [634108.786748] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jan 29 23:41:37 homecomp kernel: [634108.786749] CR2: 0000000000000008 CR3: 00000001d2b68003 CR4: 00000000003706e0
Jan 29 23:41:37 homecomp kernel: [634108.786750] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Jan 29 23:41:37 homecomp kernel: [634108.786750] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Jan 29 23:41:37 homecomp kernel: [634108.786765] BUG: kernel NULL pointer dereference, address: 0000000000000001
Jan 29 23:41:37 homecomp kernel: [634108.786768] #PF: supervisor instruction fetch in kernel mode
Jan 29 23:41:37 homecomp kernel: [634108.786770] #PF: error_code(0x0010) - not-present page
Jan 29 23:41:37 homecomp kernel: [634108.786773] PGD 8000000113c66067 P4D 8000000113c66067 PUD 0 
Jan 29 23:41:37 homecomp kernel: [634108.786779] Oops: 0010 [#2] PREEMPT SMP PTI
Jan 29 23:41:37 homecomp kernel: [634108.786782] CPU: 5 PID: 1422109 Comm: irq/136-nvidia Tainted: P      D    OE     5.10.4-rkfg #26
Jan 29 23:41:37 homecomp kernel: [634108.786784] Hardware name: Gigabyte Technology Co., Ltd. Z270P-D3/Z270P-D3-CF, BIOS F3 02/10/2017
Jan 29 23:41:37 homecomp kernel: [634108.786786] RIP: 0010:0x1
Jan 29 23:41:37 homecomp kernel: [634108.786820] Code: Unable to access opcode bytes at RIP 0xffffffffffffffd7.
Jan 29 23:41:37 homecomp kernel: [634108.786822] RSP: 0018:ffffbbc08dca7eb8 EFLAGS: 00010286
Jan 29 23:41:37 homecomp kernel: [634108.786826] RAX: 0000000000000001 RBX: 0000000000000000 RCX: 0000000000000000
Jan 29 23:41:37 homecomp kernel: [634108.786828] RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffffbbc08dca7ec8
Jan 29 23:41:37 homecomp kernel: [634108.786830] RBP: ffff990d017a2100 R08: 0000000000000046 R09: ffffbbc08dca7930
Jan 29 23:41:37 homecomp kernel: [634108.786832] R10: ffffbbc08dca7928 R11: ffffffffb64ae8b0 R12: ffff990d017a28fc
Jan 29 23:41:37 homecomp kernel: [634108.786833] R13: 0000000000000000 R14: 0000000000000001 R15: ffff990d017a2100
Jan 29 23:41:37 homecomp kernel: [634108.786862] FS:  0000000000000000(0000) GS:ffff990f4ed40000(0000) knlGS:0000000000000000
Jan 29 23:41:37 homecomp kernel: [634108.786863] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jan 29 23:41:37 homecomp kernel: [634108.786864] CR2: ffffffffffffffd7 CR3: 00000001d2b68003 CR4: 00000000003706e0
Jan 29 23:41:37 homecomp kernel: [634108.786864] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Jan 29 23:41:37 homecomp kernel: [634108.786865] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Jan 29 23:41:37 homecomp kernel: [634108.786866] Call Trace:
Jan 29 23:41:37 homecomp kernel: [634108.786868]  ? task_work_run+0x5c/0x90
Jan 29 23:41:37 homecomp kernel: [634108.786870]  ? do_exit+0x333/0xab0
Jan 29 23:41:37 homecomp kernel: [634108.786871]  ? irq_thread_check_affinity+0xc0/0xc0
Jan 29 23:41:37 homecomp kernel: [634108.786872]  ? kthread+0x142/0x160
Jan 29 23:41:37 homecomp kernel: [634108.786873]  ? rewind_stack_do_exit+0x17/0x17
Jan 29 23:41:37 homecomp kernel: [634108.786875] Modules linked in: vfat(E) msdos(E) fat(E) cpuid(E) nvidia_drm(POE) nvidia_modeset(POE) nvidia(POE) bluetooth(E) jitterentropy_rng(E) drbg(E) ansi_cprng(E) ecdh_generic(E) ecc(E) rfkill(E) sctp(E) vhost_net(E) vhost(E) vhost_iotlb(E) tap(E) dm_crypt(E) dm_mod(E) veth(E) rpcsec_gss_krb5(E) nfsv4(E) dns_resolver(E) nfs(E) nfs_ssc(E) fscache(E) overlay(E) bridge(E) stp(E) llc(E) tun(E) uinput(E) xt_owner(E) xt_TCPMSS(E) nft_counter(E) nft_chain_nat(E) xt_MASQUERADE(E) xt_mark(E) xt_nat(E) nf_nat(E) nf_conntrack(E) nf_defrag_ipv6(E) nf_defrag_ipv4(E) xt_tcpudp(E) nft_compat(E) nf_tables(E) nfnetlink(E) binfmt_misc(E) intel_rapl_msr(E) intel_rapl_common(E) x86_pkg_temp_thermal(E) intel_powerclamp(E) snd_hda_codec_realtek(E) snd_hda_codec_generic(E) ledtrig_audio(E) kvm_intel(E) snd_hda_codec_hdmi(E) kvm(E) snd_hda_intel(E) irqbypass(E) snd_intel_dspcfg(E) crc32_pclmul(E) snd_hda_codec(E) snd_hda_core(E) ghash_clmulni_intel(E) snd_hwdep(E) snd_pcm_oss(E) aesni_intel(E)
Jan 29 23:41:37 homecomp kernel: [634108.786898]  snd_mixer_oss(E) libaes(E) crypto_simd(E) snd_pcm(E) cryptd(E) snd_seq_midi(E) glue_helper(E) snd_seq_midi_event(E) rapl(E) snd_rawmidi(E) intel_cstate(E) snd_seq(E) intel_uncore(E) snd_seq_device(E) snd_timer(E) iTCO_wdt(E) iTCO_vendor_support(E) watchdog(E) evdev(E) joydev(E) drm_kms_helper(E) snd(E) cec(E) soundcore(E) fb_sys_fops(E) syscopyarea(E) mei_me(E) sysfillrect(E) sysimgblt(E) sg(E) mei(E) acpi_pad(E) button(E) sch_fq(E) tcp_yeah(E) tcp_vegas(E) v4l2loopback(OE) videodev(E) mc(E) lm75(E) regmap_i2c(E) coretemp(E) ecryptfs(E) parport_pc(E) ppdev(E) nfsd(E) lp(E) parport(E) auth_rpcgss(E) nfs_acl(E) lockd(E) grace(E) drm(E) fuse(E) configfs(E) sunrpc(E) ip_tables(E) x_tables(E) autofs4(E) ext4(E) crc16(E) mbcache(E) jbd2(E) raid10(E) raid456(E) async_raid6_recov(E) async_memcpy(E) async_pq(E) async_xor(E) xor(E) async_tx(E) raid6_pq(E) libcrc32c(E) crc32c_generic(E) raid1(E) raid0(E) multipath(E) linear(E) md_mod(E) hid_generic(E) usbhid(E) hid(E) sd_mod(E) t10_pi(E)
Jan 29 23:41:37 homecomp kernel: [634108.786928]  crc_t10dif(E) crct10dif_generic(E) ahci(E) xhci_pci(E) libahci(E) r8169(E) xhci_hcd(E) realtek(E) mdio_devres(E) libata(E) crct10dif_pclmul(E) crct10dif_common(E) i2c_i801(E) crc32c_intel(E) i2c_smbus(E) libphy(E) usbcore(E) scsi_mod(E) fan(E) video(E) [last unloaded: nvidia]
Jan 29 23:41:37 homecomp kernel: [634108.786937] CR2: 0000000000000001
Jan 29 23:41:37 homecomp kernel: [634108.786940] ---[ end trace 433531edb2a930ba ]---
Jan 29 23:41:37 homecomp kernel: [634108.787074] RIP: 0010:_nv013013rm+0xd8/0x130 [nvidia]
Jan 29 23:41:37 homecomp kernel: [634108.787077] Code: 40 00 31 c0 5b 41 5c 41 5d c3 0f 1f 84 00 00 00 00 00 48 c7 46 38 00 00 00 00 48 89 f7 45 31 ed e8 dd 7a ff ff eb 9b 0f 1f 00 <49> 8b 7c 24 08 e8 de 29 00 00 48 85 c0 74 b4 49 83 7c 24 08 00 74
Jan 29 23:41:37 homecomp kernel: [634108.787079] RSP: 0018:ffffbbc08dca7c40 EFLAGS: 00010246
Jan 29 23:41:37 homecomp kernel: [634108.787080] RAX: 0000000000000001 RBX: ffff9908e393ab88 RCX: 0000000000000010
Jan 29 23:41:37 homecomp kernel: [634108.787081] RDX: ffff9908484fefd8 RSI: 00000000004789f2 RDI: ffff9908618400f0
Jan 29 23:41:37 homecomp kernel: [634108.787082] RBP: ffff9908e393ab10 R08: ffffffffc571fa80 R09: ffff9908e393aa60
Jan 29 23:41:37 homecomp kernel: [634108.787082] R10: ffff99087b35c008 R11: ffff99087b35d098 R12: 0000000000000000
Jan 29 23:41:37 homecomp kernel: [634108.787083] R13: 0000000000000001 R14: 00000000beef0003 R15: ffff9908e393abd0
Jan 29 23:41:37 homecomp kernel: [634108.787084] FS:  0000000000000000(0000) GS:ffff990f4ed40000(0000) knlGS:0000000000000000
Jan 29 23:41:37 homecomp kernel: [634108.787084] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jan 29 23:41:37 homecomp kernel: [634108.787085] CR2: ffffffffffffffd7 CR3: 00000001d2b68003 CR4: 00000000003706e0
Jan 29 23:41:37 homecomp kernel: [634108.787086] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Jan 29 23:41:37 homecomp kernel: [634108.787086] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Jan 29 23:41:37 homecomp kernel: [634108.787087] Fixing recursive fault but reboot is needed!

It happened almost a month ago when I was just browsing web and upon switching a tab or something like that Xorg completely froze, mouse cursor stopped moving, keyboard didn’t respond etc. As I said the system itself was alive but unusable. The driver version I used at the moment was 460.39.

Thank you. For those who missed it, @generix has suggested that kernel preemption might be a necessary factor in this issue presenting itself. @mangamaniac1 since you are able to reproduce the bug (I cannot; I don’t get it very often) are you able to use a kernel that doesn’t have preemption, and report back whether the crashes still happen?

The crash has happened when I was watching a video on VLC.

distribution: ArchLinux
driver: nvidia-460.39
kernel: 5.10.16-zen1-1-zen
GPU: GeForce GTX 970 (GM204-A)
CPU: Intel(R) Core™ i7-6700K CPU @ 4.00GHz

I have managed to ssh to the console from a second machine and collect the logs using the following command: sudo nvidia-bug-report.sh --safe-mode --extra-system-data.

The original version of the script has hung so I have used a version posted by @wamogo5042

I hope you will find something useful in these logs.
nvidia-bug-report.log.gz (134.1 KB)

@kamiox
I think you forgot to attach the log to your message.

1 Like

This is not a nVidia issue.
This is not a nVidia issue.
This is not a nVidia issue.

a confluence of displayport,xhci, renesas, browser, config, integration issues

This isnt about fixing your specific issue rather this entire thread.

I’ve just read every post, bug report and log extract on this thread.
This is a super easy fix.
Firstly to clarify;
Linux is not supposed to work out of box.
Thats a Closed Source Market Standard.
The Open Source End User is “FREE”;
to “finish” the Open Source product to a Closed Source Market definition of “state of finish”

So the near total majority of posts are:

  • nVidia Driver 455.
  • Arch Linux,
  • Kernel 5.8+ to 5.9.1 Lowlatency / Pre-emptive
  • Intel LGA 1155, 1151, 1151v2, 1150
  • kernel NULL pointer dereference, address: 0000000000000020
  • kernel NULL pointer dereference, address: 0000000000000027
  • Chromium and Chromium based browsers.Chrome, Opera, Falkon
  • Firefox

“Extract from Chromium ArchLinux Wiki”

Hardware video acceleration

  • There is no official support from Chromium or Arch Linux for this feature (Chromium Docs - VA-API), but you may ask for help in the dedicated forum thread.

  • chromium from official repositories is compiled with VA-API support.

  • For proprietary NVIDIA support, installing libva-vdpau-driver-chromiumAUR or libva-vdpau-driver-vp9-gitAUR is required.

  • Wayland is not supported.

  • To use VA-API on XWayland, use the --use-gl=egl flag. Currently exhibits choppiness FS#67035. It could be solved by enabling #Native Wayland support.

  • To use VA-API on Xorg, use the --use-gl=desktop flag.

  • Starting in Chromium 86, there will be support for VA-API when using the ANGLE gl renderer. Use the --enable-accelerated-video-decode to enable it on an Intel GPU."

BTW, Hows ARCH working out for ya!?

If the your system isn’t configured and integrated as per / The Book and the above Browsers aren’t dury rigged with workarounds then this WILL exponentially exacerbate and exploit the poor system integration, configuration coupled with the lack of support in the kernel or other such issues.

Correct Bios Settings are critical.
Correct Kernel parameters are critical

I also saw multiple posts using PowerSave aswell in some form.
This affects the nVidia driver aswell. It wants to ramp up and is getting choked.
Disbale all power management for PCI express.

The biggest issue is BIOS DMA Buffer/ VM / IOMMU and xHCI settings and support.
Is xHCI handover still enabled in the BIOS?
USB 2/ PS/2 Legacy support uses VM is the BIOS.

Kernel 5.8
In Arch Linux and Manjaro 5.8+ kernel has issues with Renesas USB controllers due to a FW version check issue.

Kernel EDID patch: 20201203
Removed the Skylake/Kabylake platform detection logic and makes the edid function work on all platforms. Regardless, with the patch, a kernel oops occurs on the function intel_vgpu_reg_rw_edid in drivers/drm/i915/kvmgt.c.

outtatime

So … the bug began on nvidia driver 450, not only Arch Linux, kernel 5.4 and 4.19 are also hit and the crash happened without any browser in some cases (vlc for example).

For the Hardware video acceleration, without this feature, the crash happened too (yes, I tried again and again and now I just compile the 435 for my kernel and I don’t have any crash).

I don’t use the PowerSave (because of nvidia, this problem is here for few years now), the problem with the firmware version is solved for few months (and to clarify, previous kernel have the same bug with nvidia driver).

Why this bug don’t hit the nvidia’s driver before 450 and why without chrome or whatever with the Hardware acceleration, this bug still happened ?

My questions are not really questions, It’s just to compare with the post above @abelits

ps : thanks for all instructive links

1 Like

Please stop. The kernel crash on dereferencing a NULL pointer in a driver’s function is probably the most conclusive and unambiguous indication that a bug is in the driver.

2 Likes

Most likely because it was introduced in that version.

Because “not using hardware acceleration” option in one userspace program does not reliably prevent any particular piece of driver’s functionality from being used, especially in a modern desktop that uses compositing for everything. Also the problem is probably in the implementation of some basic functionality, most likely a race condition in something very common. The number of calls may affect the likeliness of a crash, however it can’t be eliminated entirely.

If a guess made by @generix is correct, and preemption is either the necessary condition or it greatly increases the probability of a crash being triggered, it would strongly indicate a race condition.