465.24.02 page fault

Thanks for your response. Unfortunately the fault freezes the entire system, so I can’t run nvidia-bug-report without a reboot. Is it still useful to do this?

Is the system still available over SSH? If not, it’s still useful to collect a bug report after rebooting.

Would it be possible to try this with the 465.27 driver that was released this morning to see if it still happens with the latest code?

I’ve tried it with the 465.27 driver and still have the problem. Unfortunately I wasn’t able to obtain a bug report without rebooting, but I have attached one obtained after the reboot. nvidia-bug-report.log.gz (1.2 MB)

Thanks, that’s actually really helpful. I filed internal bug number 3302807. While the bug tracker is not public, you can use that number to refer to this issue in future correspondence.

Hi, I am having a similar issue on a GTX 1060, i7 6700K running Arch Linux with current Kernel 5.11.16, nvidia 465.27-

[   31.430235] BUG: kernel NULL pointer dereference, address: 0000000000000000
[   31.430239] #PF: supervisor read access in kernel mode
[   31.430240] #PF: error_code(0x0000) - not-present page
[   31.430241] PGD 0 P4D 0
[   31.430243] Oops: 0000 [#1] PREEMPT SMP PTI
[   31.430245] CPU: 0 PID: 764 Comm: nv_queue Tainted: P           OE     5.11.16-arch1-1 #1
[   31.430247] Hardware name: System manufacturer System Product Name/Z170 PRO GAMING, BIOS 1104 01/11/2016
[   31.430248] RIP: 0010:_nv032382rm+0x21c/0x500 [nvidia]
[   31.430593] Code: 8b 87 e0 01 00 00 e8 c3 3f 4e d5 45 85 ff 45 89 fc 0f 94 c0 41 f7 d4 41 83 e6 01 75 05 84 45 30 75 2c 48 8b 45 18 48 8b 4d 20 <44> 23 38 44 89 39 44 23 20 48 8b 45 28 44 89 20 5b 41 5c 41 5d 41
[   31.430594] RSP: 0018:ffffaf9f80ed7d98 EFLAGS: 00010246
[   31.430596] RAX: 0000000000000000 RBX: 0000000000000004 RCX: 0000000000000000
[   31.430597] RDX: 0000000000000007 RSI: ffff97dcac260008 RDI: ffff97dcac098008
[   31.430598] RBP: ffff97dd2906af60 R08: 0000000000000001 R09: ffff97dd2906ae68
[   31.430599] R10: ffff97dcac098008 R11: 0000000010100000 R12: 00000000fffffbff
[   31.430600] R13: ffff97dcacdfc010 R14: 0000000000000000 R15: 0000000000000400
[   31.430601] FS:  0000000000000000(0000) GS:ffff97e39e400000(0000) knlGS:0000000000000000
[   31.430603] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   31.430604] CR2: 0000000000000000 CR3: 0000000513610001 CR4: 00000000003706f0
[   31.430605] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   31.430606] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   31.430607] Call Trace:
[   31.430609]  ? _nv026304rm+0xce/0x760 [nvidia]
[   31.430896]  ? _nv026309rm+0x1b0/0x1d0 [nvidia]
[   31.431181]  ? rm_execute_work_item+0x108/0x120 [nvidia]
[   31.431436]  ? schedule_timeout+0x11c/0x160
[   31.431440]  ? os_execute_work_item+0x46/0x60 [nvidia]
[   31.431633]  ? _main_loop+0x83/0x130 [nvidia]
[   31.431826]  ? nvidia_modeset_resume+0x20/0x20 [nvidia]
[   31.432019]  ? kthread+0x133/0x150
[   31.432022]  ? __kthread_bind_mask+0x60/0x60
[   31.432024]  ? ret_from_fork+0x22/0x30

~

I have two monitors

  • 24" 4K (portrait) connected over HDMI
  • 27" 2560x1440 connected over DP

What I tested so far:

  • Both plugged and I get the crash but still able to SSH.
  • If I unplug the 27" (DP) and only run the 24" (HDMI), then it works fine. Once booted I tried to plug the 27" DP display => driver crash
  • If I unplug the 24" and only run the 27", it crashes at boot and network was unavailable.

Looks like he 27" over DP is the culprit.

I ssh’ed to the box after the crash but nvidia-bug-report hangs, also when running with nvidia-bug-report.sh --safe-mode --extra-system-data

But in case the very limited info collected might help, I have attached it.
nvidia-bug-report.log.gz (1.1 KB)

This is nvidia-bug-report when I could boot with a single monitor (24" HDMI)
nvidia-bug-report-booting.log.gz (387.2 KB)

First time reporting here - let me know what else I could provide

When I downgrade to 460.67 it works fine.

1 Like

This very much resembles the issue discussed here:
https://bbs.archlinux.org/viewtopic.php?id=265563

2 Likes

@rnbzilla @geo_ffrey
I am still trying to duplicate issue locally which will help us to debug faster.
Meanwhile, can you please confirm the last working driver version (if any).

460.67 in my case.

1 Like

Hi. I can reproduce when using version 465.27-4 (and other various versions newer than 460.67-4, I’ve tried updating at least 4 times in the last few weeks, and always got this behavior)

My system is a x570 with Ryzen 3950x, with a 2080 Ti Rev A. My kernel is Linux 5.11.16-arch1-1

Here is the kernel log on boot:

BUG: kernel NULL pointer dereference, address: 000000000000001c
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 0 P4D 0 
Oops: 0000 [#1] PREEMPT SMP NOPTI
CPU: 11 PID: 812 Comm: systemd-udevd Tainted: P           OE     5.11.16-arch1-1 #1
Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./X570 Pro4, BIOS P3.00 06/17/2020
RIP: 0010:_nv032271rm+0xe/0x80 [nvidia]
Code: 89 d2 e8 45 88 f7 c4 48 83 c4 08 48 83 c5 50 c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 85 ff 74 5f 48 8b 17 48 89 f8 48 8b 0a <39> 71 04 74 53 8b 52 10 48 29 d0 48 8b 10 48 8b 12 39 72 04 74 42
RSP: 0018:ffff97b24389f578 EFLAGS: 00010282
RAX: ffff88fd439e8008 RBX: ffff88fd439e8008 RCX: 0000000000000018
RDX: ffffffffc4742400 RSI: 0000000000497031 RDI: ffff88fd439e8008
RBP: ffff88fd439d5c40 R08: 0000000000000020 R09: ffff88fd439d5c68
R10: ffff88fd058a0008 R11: 0000000010000010 R12: 00000000007ef3cb
R13: 0000000000000000 R14: 00000000000927c0 R15: ffff88fd058a0008
FS:  00007fcff5050a40(0000) GS:ffff890beecc0000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000000000000001c CR3: 0000000106c38000 CR4: 0000000000350ee0
Call Trace:
? _nv032275rm+0x15/0x90 [nvidia]
? _nv039682rm+0x18/0xc0 [nvidia]
? _nv039641rm+0xf1/0x1f0 [nvidia]
? _nv018507rm+0xc2/0x180 [nvidia]
? _nv018462rm+0x19a/0x750 [nvidia]
? _nv032436rm+0x14b/0x200 [nvidia]
? _nv032436rm+0xab/0x200 [nvidia]
? _nv000859rm+0x2a5/0x470 [nvidia]
? _nv009647rm+0x4c3/0x650 [nvidia]
? _nv032313rm+0x11f/0x270 [nvidia]
? _nv032310rm+0x15d/0x1a0 [nvidia]
? _nv015534rm+0x232/0x330 [nvidia]
? _nv015556rm+0x7fd/0x1020 [nvidia]
? _nv027155rm+0x22c/0x4f0 [nvidia]
? _nv017787rm+0x303/0x5e0 [nvidia]
? _nv017788rm+0x30/0xa0 [nvidia]
? _nv017789rm+0xe1/0x220 [nvidia]
? _nv022829rm+0xed/0x220 [nvidia]
? _nv023065rm+0x30/0x60 [nvidia]
? _nv000704rm+0x16da/0x22b0 [nvidia]
? rm_init_adapter+0xc5/0xe0 [nvidia]
? kthread_create_on_node+0x51/0x70
? nv_open_device+0x122/0x8a0 [nvidia]
? nvidia_dev_get+0x63/0xb0 [nvidia]
? nvkms_open_gpu+0x4e/0x90 [nvidia_modeset]
? _nv000010kms+0x40/0x260 [nvidia_modeset]
? printk+0x68/0x7f
? security_kernfs_init_security+0x2a/0x40
? nv_drm_load+0xac/0x3ae [nvidia_drm]
? nv_drm_master_drop+0x60/0x60 [nvidia_drm]
? drm_dev_register+0xc8/0x1b0 [drm]
? nv_drm_probe_devices+0x184/0x210 [nvidia_drm]
? 0xffffffffc0a8e000
? do_one_initcall+0x57/0x220
? do_init_module+0x5c/0x270
? load_module+0x243e/0x2610
? __do_sys_init_module+0x136/0x1b0
? do_syscall_64+0x33/0x40
? entry_SYSCALL_64_after_hwframe+0x44/0xa9
Modules linked in: nvidia_drm(POE+) nvidia_modeset(POE) ucsi_ccg typec_ucsi intel_rapl_msr typec wmi_bmof nvidia(POE) snd_hda_codec_realtek snd_hda_codec_generic iwlmvm ledtrig_audio snd_hda_codec_hdmi snd_hda_intel snd_intel_dspcfg m>
i2c_nvidia_gpu soundcore fb_sys_fops curve25519_x86_64 rfkill dca libchacha20poly1305 chacha_x86_64 poly1305_x86_64 libblake2s blake2s_x86_64 ip6_udp_tunnel wmi udp_tunnel libcurve25519_generic libchacha libblake2s_generic pinctrl_am>
CR2: 000000000000001c
---[ end trace b5ea4402a89e97ae ]---
RIP: 0010:_nv032271rm+0xe/0x80 [nvidia]
Code: 89 d2 e8 45 88 f7 c4 48 83 c4 08 48 83 c5 50 c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 85 ff 74 5f 48 8b 17 48 89 f8 48 8b 0a <39> 71 04 74 53 8b 52 10 48 29 d0 48 8b 10 48 8b 12 39 72 04 74 42
RSP: 0018:ffff97b24389f578 EFLAGS: 00010282
RAX: ffff88fd439e8008 RBX: ffff88fd439e8008 RCX: 0000000000000018
RDX: ffffffffc4742400 RSI: 0000000000497031 RDI: ffff88fd439e8008
RBP: ffff88fd439d5c40 R08: 0000000000000020 R09: ffff88fd439d5c68
R10: ffff88fd058a0008 R11: 0000000010000010 R12: 00000000007ef3cb
R13: 0000000000000000 R14: 00000000000927c0 R15: ffff88fd058a0008
FS:  00007fcff5050a40(0000) GS:ffff890beecc0000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000000000000001c CR3: 0000000106c38000 CR4: 0000000000350ee0  

I am able to fix this issue by downgrading to 460.67-4.

nvidia-460.67-7 was the latest working arch linux driver package that I had installed.

both nvidia-465.24.02-4 and nvidia-465.27-2 are failing

I’ve the same issue, Fedora 33, kernel 5.11.16, RTX2070 super, 3 DP displays attached.

The issue appears to only affect certain monitors and only when when connected by DisplayPort. Some people have reported ‘solving’ the problem by switching from DP to HDMI.

For me at least it still fails if using HDMI.

I used below 2 configuration setups but could not reproduce issue.
Alienware Area-51 R6 + AMD Ryzen Threadripper 1950X 16-Core Processor + Ubuntu 19.04 + kernel 5.11.18 + Driver 465.24.02 + NVIDIA GeForce RTX 2080
Precision T7610 + Genuine Intel(R) CPU @ 2.30GHz + Ubuntu 19.04 + kernel 5.11.18 + Driver 465.24.02 + NVIDIA Quadro M4000
Initially I connected ASUS XG35V with resolution 3440x 1440 to system and booted up.
Once system came up, I connected another 4k display Dell UP3214Q with resolution 3840 x 2160 and ran few opengl application but could not hit with crash issue.
I also tried connect/disconnect monitor multiple times on both setups and also replaced Asus monitor with DELL G2410 but not able to duplicate issue so far.
Both displays were connected using display ports.

Please confirm if issue triggers immediately for everyone when secondary 4k monitor is connected.
If not, then please confirm repro steps so that I can try and explore more to duplicate issue at our end.

Crashes immediately here on connection of BenQ SW271 4K monitor, as described in my original post. Will boot fine with two 1920 monitors, but fails when the 4K is then connected. Also fails with just the 4K monitor connected and no others.

The same thing happened to me after upgrading fedora from 33 to 34 - initially I thought that it was because of the system upgrade, so I reinstalled the system. Unfortunately, it broke again after I installed the newest nvidia driver (initially from rpmfusion, by installing akmod-nvidia). After that, I installed the 465* driver from nvidia.com and when that didn’t help, I installed the 460* driver and I was finally able to boot my PC.
I haven’t tried to disconnect any of my two monitors (some old 4:3 samsung over HDMI and Dell 2418D (1440p) over display port though. My setup is NVIDIA 2060 SUPER and i7 9700K, kernel 5.11.17.

I would be happy to provide any more details that can help you fix this issue.

  • ArchLinux
  • MS-7B09 1.0
  • AMD Ryzen Threadripper 1950X
  • NVIDIA GeForce GTX 1080 Ti

Current configuration:
HDMI → VGA 1366x768, DP 1920x1080, DP 2560x1440
Previous configuration:
HDMI → VGA 1366x768, DP 1920x1080, DVI-D 1366x768

Both of them crash the system instantly starting any X session (sddm, bspwm, twm, etc)
I didn’t test with individual monitors or ports, 460 works, 465 doesn’t
EDIT: Still doesn’t work after Re-sizable BAR bios update

Is this another issue with forced preemption in the kernel? Doesn’t Arch provide another kernel with voluntarily or no preeemption?
Personally, I never enabled forced preemption as it comes with a risk that shouldn’t be taken without a reason.
Maybe report to your distribution as well.

It has been, however this happens on any arch kernel I’ve tried it with, linux, linux-zen, linux-tkg, linux-lts

Does any of them not have forced peemption enabled?