Bug report: 455.23.04 - Kernel Panic due to NULL pointer dereference

I also guess that kamiox’s new crashes is a different bug, introduced in 460.56:
https://forums.developer.nvidia.com/t/display-detection-always-crashes-hard-locks-arch-linux/169653

The second crash might be a different bug, but my first crash on recent drivers was very similar to those previously reported. It has happened when I was running VLC in a full-screen mode, unfortunately, the system crashed completely so I was unable to get any logs from the machine.

I filed internal bug 3268472 for this new crash.

1 Like

Now been running the new drivers for around a week without any crashes. :)

So far no problems since February 25 when I have installed 460.56.

I had downgraded CUDA to 10.2.89 before, so I could downgrade the driver to 440.100. Now I will try to upgrade CUDA again to 11.2.0, and see if it will cause any trouble.

Installed 460.56-1 on 02/25 and haven’t had issues since. Installing 460.56-2 on 03/08 and seeing how that will go.

Only half a year to fix a critical bug, and years in the making for proper Wayland support, maybe Linus was wrong after all. Thanks nvidia!

I’m running the 460.56 but recompiled my kernel with CONFIG_PREEMPT=n, so far no problems, can have latest CUDA and all the fancy stuff. Thanks @generix.

I have the same with CONFIG_PREEMPT in kernel 5.4.97, and it seems to work so far. CUDA 11.2.0 and 11.2.1 compilation crashes, however 11.1.1 builds and works.

Hi kamiox,
Please confirm if you are still observing crash issue.

I tried reproducing issue on multiple setups after running VLC player and doing suspend / resume multiple times but did not hit with repro.

Precision T7600 + Genuine Intel(R) CPU @ 2.60GHz + 5.9.1-arch1-1 + Driver 460.56 + NVIDIA TITAN Xp
Alienware + AMD Ryzen Threadripper 1950X 16-Core Processor + Ubuntu 20.04.2 LTS + Driver 460.56 + RTX 3090

Can you please help with detailed repro steps.

Hi @amrits

I’m currently using Nvidia Driver 465.24.02 and Linux 5.11.14-zen1-1-zen. I did not observe any crashes recently, however, I didn’t stress the system with the same usage as before (by running VLC or Kodi).
I will try to do some tests in the upcoming days and will let you know if I experience any crashes.

@kamiox
Thanks for the update, will await for your test results.

An issue has hit many members of the Arch linux community, myself included with the latest drivers. The solution is to downgrade the kernel to 5.11.13 or older and nvidia drivers 460.67. The problem appears to happen with kernel 5.11.15 + 465 nvidia driver, myself included with a 3080 FE:

Apr 20 02:35:31 chrome kernel: BUG: kernel NULL pointer dereference, address: 0000000000000170
Apr 20 02:35:31 chrome kernel: #PF: supervisor read access in kernel mode
Apr 20 02:35:31 chrome kernel: #PF: error_code(0x0000) - not-present page

Calling nvidia-smi from the shell will hang/freeze the machine requiring a hard reboot, as will trying to start X. There’s also a fair number of people reporting unplugging their secondary monitor fixes the problem, but unplugging the primary monitor doesn’t. I have 2 DP monitors myself but downgraded before I tried.

For more info:
https://bbs.archlinux.org/viewtopic.php?id=265563

Kernel panic happens during boot.

lspci -k | grep -A 2 -E “(VGA|3D)”

01:00.0 VGA compatible controller: NVIDIA Corporation TU102 [GeForce RTX 2080 Ti Rev. A] (rev a1)
        Subsystem: Micro-Star International Co., Ltd. [MSI] RTX 2080 Ti GAMING X TRIO
        Kernel modules: nouveau, nvidia_drm, nvidia

Grub cmdline options:

Apr 19 23:33:21 redstar kernel: Command line: BOOT_IMAGE=/vmlinuz-linux root=UUID=355f88b0-acb5-4e41-b859-707c985eddd8 rw loglevel=3 nvidia-drm.modeset=1 ignore_loglevel

uname -a

Linux redstar 5.11.15-arch1-2 #1 SMP PREEMPT Sat, 17 Apr 2021 00:22:30 +0000 x86_64 GNU/Linux

Bug:

Apr 19 23:33:27 redstar kernel: BUG: kernel NULL pointer dereference, address: 0000000000000170
Apr 19 23:33:27 redstar kernel: #PF: supervisor read access in kernel mode
Apr 19 23:33:27 redstar kernel: #PF: error_code(0x0000) - not-present page
Apr 19 23:33:27 redstar kernel: PGD 0 P4D 0 
Apr 19 23:33:27 redstar kernel: Oops: 0000 [#1] PREEMPT SMP NOPTI
Apr 19 23:33:27 redstar kernel: CPU: 1 PID: 412 Comm: systemd-udevd Tainted: P           OE     5.11.15-arch1-2 #1
Apr 19 23:33:27 redstar kernel: Hardware name: Micro-Star International Co., Ltd. MS-7C79/MPG Z490 GAMING EDGE WIFI (MS-7C79), BIOS 1.60 02/01/2021
Apr 19 23:33:27 redstar kernel: RIP: 0010:_nv015534rm+0x1b6/0x330 [nvidia]
Apr 19 23:33:27 redstar kernel: Code: 8b 87 68 05 00 00 ba 01 00 00 00 be 02 00 00 00 e8 cf 50 9a c2 41 83 c5 01 41 83 fd 1f 0f 84 0b 01 00 00 48 8b 45 10 44 89 ee <48> 8b b8 70 01 00 00 48 8b 87 d8 04 00 00 e8 a7 50 9a c2 89 c3 48
Apr 19 23:33:27 redstar kernel: RSP: 0018:ffffb1fc013cb780 EFLAGS: 00010293
Apr 19 23:33:27 redstar kernel: RAX: 0000000000000000 RBX: 0000000000002000 RCX: 0000000000000004
Apr 19 23:33:27 redstar kernel: RDX: 0000000000000004 RSI: 0000000000000005 RDI: 0000000000000000
Apr 19 23:33:27 redstar kernel: RBP: ffff90dddc21add0 R08: 0000000000000001 R09: ffff90dddc21acb8
Apr 19 23:33:27 redstar kernel: R10: ffff90dddcb10008 R11: 0000000010100000 R12: 0000000000002400
Apr 19 23:33:27 redstar kernel: R13: 0000000000000005 R14: ffff90ddd92f4010 R15: 0000000000008000
Apr 19 23:33:27 redstar kernel: FS:  00007f76eb7aea40(0000) GS:ffff90e51da40000(0000) knlGS:0000000000000000
Apr 19 23:33:27 redstar kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Apr 19 23:33:27 redstar kernel: CR2: 0000000000000170 CR3: 0000000110afa005 CR4: 00000000007706e0
Apr 19 23:33:27 redstar kernel: PKRU: 55555554
Apr 19 23:33:27 redstar kernel: Call Trace:
Apr 19 23:33:27 redstar kernel:  ? _nv015556rm+0x7fd/0x1020 [nvidia]
Apr 19 23:33:27 redstar kernel:  ? _nv027154rm+0x22c/0x4f0 [nvidia]
Apr 19 23:33:27 redstar kernel:  ? _nv017786rm+0x303/0x5e0 [nvidia]
Apr 19 23:33:27 redstar kernel:  ? _nv017787rm+0x30/0xa0 [nvidia]
Apr 19 23:33:27 redstar kernel:  ? _nv017788rm+0xe1/0x220 [nvidia]
Apr 19 23:33:27 redstar kernel:  ? _nv022828rm+0xed/0x220 [nvidia]
Apr 19 23:33:27 redstar kernel:  ? _nv023064rm+0x30/0x60 [nvidia]
Apr 19 23:33:27 redstar kernel:  ? _nv000704rm+0x16da/0x22b0 [nvidia]
Apr 19 23:33:27 redstar kernel:  ? rm_init_adapter+0xc5/0xe0 [nvidia]
Apr 19 23:33:27 redstar kernel:  ? kthread_create_on_node+0x51/0x70
Apr 19 23:33:27 redstar kernel:  ? nv_open_device+0x122/0x8a0 [nvidia]
Apr 19 23:33:27 redstar kernel:  ? nvidia_dev_get+0x63/0xb0 [nvidia]
Apr 19 23:33:27 redstar kernel:  ? nvkms_open_gpu+0x4e/0x90 [nvidia_modeset]
Apr 19 23:33:27 redstar kernel:  ? _nv000010kms+0x40/0x260 [nvidia_modeset]
Apr 19 23:33:27 redstar kernel:  ? printk+0x68/0x7f
Apr 19 23:33:27 redstar kernel:  ? security_kernfs_init_security+0x2a/0x40
Apr 19 23:33:27 redstar kernel:  ? nv_drm_load+0xac/0x3ae [nvidia_drm]
Apr 19 23:33:27 redstar kernel:  ? nv_drm_master_drop+0x60/0x60 [nvidia_drm]
Apr 19 23:33:27 redstar kernel:  ? drm_dev_register+0xc8/0x1b0 [drm]
Apr 19 23:33:27 redstar kernel:  ? nv_drm_probe_devices+0x184/0x210 [nvidia_drm]
Apr 19 23:33:27 redstar kernel:  ? 0xffffffffc0baf000
Apr 19 23:33:27 redstar kernel:  ? do_one_initcall+0x57/0x220
Apr 19 23:33:27 redstar kernel:  ? do_init_module+0x5c/0x270
Apr 19 23:33:27 redstar kernel:  ? load_module+0x243e/0x2610
Apr 19 23:33:27 redstar kernel:  ? __do_sys_init_module+0x136/0x1b0
Apr 19 23:33:27 redstar kernel:  ? do_syscall_64+0x33/0x40
Apr 19 23:33:27 redstar kernel:  ? entry_SYSCALL_64_after_hwframe+0x44/0xa9
Apr 19 23:33:27 redstar kernel: Modules linked in: joydev mousedev uvcvideo btusb btrtl btbcm videobuf2_vmalloc btintel videobuf2_memops videobuf2_v4l2 bluetooth snd_usb_audio videobuf2_common videodev snd_usbmidi_lib snd_rawmidi snd_seq_device mc ecdh_generic ecc usbhid crc16 intel_rapl_msr nvidia_drm(POE+) intel_rapl_common nvidia_modeset(POE) snd_sof_pci snd_sof_intel_hda_common uas nvidia(POE) usb_storage snd_sof_intel_hda snd_sof_intel_byt ucsi_ccg iTCO_wdt typec_ucsi snd_sof_intel_ipc intel_pmc_bxt ee1004 iTCO_vendor_support mei_hdcp typec wmi_bmof intel_wmi_thunderbolt mxm_wmi snd_sof snd_sof_xtensa_dsp snd_soc_skl snd_hda_codec_realtek snd_hda_codec_generic snd_soc_hdac_hda snd_hda_ext_core snd_hda_codec_hdmi ledtrig_audio snd_soc_sst_ipc x86_pkg_temp_thermal snd_soc_sst_dsp intel_powerclamp iwlmvm snd_soc_acpi_intel_match coretemp snd_soc_acpi kvm_intel snd_hda_intel snd_intel_dspcfg soundwire_intel soundwire_generic_allocation soundwire_cadence mac80211 kvm snd_hda_codec libarc4 irqbypass
Apr 19 23:33:27 redstar kernel:  snd_hda_core crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_hwdep aesni_intel r8125(OE) soundwire_bus crypto_simd snd_soc_core iwlwifi cryptd r8169 snd_compress glue_helper ac97_bus rapl snd_pcm_dmaengine intel_cstate realtek intel_uncore cfg80211 drm_kms_helper snd_pcm pcspkr i2c_i801 mdio_devres snd_timer i2c_smbus mei_me cec libphy snd mei syscopyarea sysfillrect soundcore sysimgblt rfkill fb_sys_fops i2c_nvidia_gpu intel_pch_thermal video mac_hid wmi acpi_tad acpi_pad vboxnetflt(OE) vboxnetadp(OE) vboxdrv(OE) drm sg crypto_user fuse agpgart bpf_preload ip_tables x_tables btrfs blake2b_generic libcrc32c crc32c_generic xor raid6_pq xhci_pci crc32c_intel xhci_pci_renesas
Apr 19 23:33:27 redstar kernel: CR2: 0000000000000170
Apr 19 23:33:27 redstar kernel: ---[ end trace 60456de3156bc3b3 ]---
Apr 19 23:33:27 redstar kernel: RIP: 0010:_nv015534rm+0x1b6/0x330 [nvidia]
Apr 19 23:33:27 redstar kernel: Code: 8b 87 68 05 00 00 ba 01 00 00 00 be 02 00 00 00 e8 cf 50 9a c2 41 83 c5 01 41 83 fd 1f 0f 84 0b 01 00 00 48 8b 45 10 44 89 ee <48> 8b b8 70 01 00 00 48 8b 87 d8 04 00 00 e8 a7 50 9a c2 89 c3 48
Apr 19 23:33:27 redstar kernel: RSP: 0018:ffffb1fc013cb780 EFLAGS: 00010293
Apr 19 23:33:27 redstar kernel: RAX: 0000000000000000 RBX: 0000000000002000 RCX: 0000000000000004
Apr 19 23:33:27 redstar kernel: RDX: 0000000000000004 RSI: 0000000000000005 RDI: 0000000000000000
Apr 19 23:33:27 redstar kernel: RBP: ffff90dddc21add0 R08: 0000000000000001 R09: ffff90dddc21acb8
Apr 19 23:33:27 redstar kernel: R10: ffff90dddcb10008 R11: 0000000010100000 R12: 0000000000002400
Apr 19 23:33:27 redstar kernel: R13: 0000000000000005 R14: ffff90ddd92f4010 R15: 0000000000008000
Apr 19 23:33:27 redstar kernel: FS:  00007f76eb7aea40(0000) GS:ffff90e51da40000(0000) knlGS:0000000000000000
Apr 19 23:33:27 redstar kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Apr 19 23:33:27 redstar kernel: CR2: 0000000000000170 CR3: 0000000110afa005 CR4: 00000000007706e0
Apr 19 23:33:27 redstar kernel: PKRU: 55555554
Apr 19 23:33:27 redstar systemd-udevd[367]: Worker [412] terminated by signal 9 (KILL)
Apr 19 23:33:27 redstar systemd-udevd[367]: 0000:01:00.0: Worker [412] failed
Apr 19 23:33:29 redstar NetworkManager[474]: <info>  [1618846409.2434] manager: NetworkManager state is now CONNECTED_GLOBAL
Apr 19 23:33:50 redstar dbus-daemon[473]: [system] Failed to activate service 'org.freedesktop.resolve1': timed out (service_start_timeout=25000ms)
Apr 19 23:33:55 redstar systemd-timesyncd[470]: Initial synchronization to time server 27.124.125.251:123 (2.arch.pool.ntp.org).
-- Boot 90d2768fdc6d4deebd68db5ea7028005 --

setting grub cmdline option acpi=off in grub allows Arch to boot, but different problems happen:

Apr 19 00:26:10 redstar kernel: Command line: BOOT_IMAGE=/boot/vmlinuz-linux root=UUID=355f88b0-acb5-4e41-b859-707c985eddd8 rw loglevel=3 nvidia-drm.modeset=1 pci=noacpi
Apr 19 00:26:10 redstar kernel: nvidia: loading out-of-tree module taints kernel.
Apr 19 00:26:10 redstar kernel: nvidia: module license 'NVIDIA' taints kernel.
Apr 19 00:26:10 redstar kernel: Disabling lock debugging due to kernel taint
Apr 19 00:26:10 redstar kernel: nvidia: module verification failed: signature and/or required key missing - tainting kernel
Apr 19 00:26:10 redstar kernel: nvidia-nvlink: Nvlink Core is being initialized, major device number 239
Apr 19 00:26:10 redstar kernel:
Apr 19 00:26:10 redstar kernel: nvidia 0000:01:00.0: can't find IRQ for PCI INT A; please try using pci=biosirq
Apr 19 00:26:10 redstar kernel: NVRM: Can't find an IRQ for your NVIDIA card!
Apr 19 00:26:10 redstar kernel: NVRM: Please check your BIOS settings.
Apr 19 00:26:10 redstar kernel: NVRM: [Plug & Play OS] should be set to NO
Apr 19 00:26:10 redstar kernel: NVRM: [Assign IRQ to VGA] should be set to YES
Apr 19 00:26:10 redstar kernel: nvidia: probe of 0000:01:00.0 failed with error -1
Apr 19 00:26:10 redstar kernel: NVRM: The NVIDIA probe routine failed for 1 device(s).
Apr 19 00:26:10 redstar kernel: NVRM: None of the NVIDIA devices were initialized.
Apr 19 00:26:10 redstar kernel: nvidia-nvlink: Unregistered the Nvlink Core, major device number 239
Apr 19 00:26:10 redstar kernel: nvidia-nvlink: Nvlink Core is being initialized, major device number 239
Apr 19 00:26:10 redstar kernel: NVRM: Can't find an IRQ for your NVIDIA card!

This bug report was generated with acpi=off in cmdline options.

If any more information is needed, let me know.

nvidia-bug-report.log.gz (76.7 KB)

@fearfactory2006 Looks like you have nouveau module loaded. It has to be blacklisted while you’re using Nvidia binary driver.

@kamiox
Did you get a chance to run tests.