Arch linux | hw: rtx 3070 ti | driver 510.54-7 | Display hangs while loading driver | kernel Oops

Hello!

Please consider the following bug report:

  • Hardware: MSI Stealth GS66 12UGS
  • OS: Arch linux
  • Installed nvidia driver version: 510.54

$ uname -a
Linux afro 5.16.14-arch1-1 #1 SMP PREEMPT Fri, 11 Mar 2022 17:40:36 +0000 x86_64 GNU/Linux

$ pacman -Qi nvidia
Name : nvidia
Version : 510.54-7
Description : NVIDIA drivers for linux
Architecture : x86_64
Licenses : custom
Groups : None
Provides : NVIDIA-MODULE
Depends On : linux nvidia-utils=510.54 libglvnd
Optional Deps : None
Required By : None
Optional For : None
Conflicts With : None
Replaces : None
Installed Size : 27.54 MiB
Packager : Jan Alexander Steffens
Build Date : Fri 11 Mar 2022 06:06:49 PM WET
Install Date : Thu 17 Mar 2022 11:16:07 AM WET
Install Reason : Explicitly installed
Install Script : No
Validated By : Signature

$ lspci | grep VGA

00:02.0 VGA compatible controller: Intel Corporation Alder Lake-P Integrated Graphics Controller (rev 0c)
01:00.0 VGA compatible controller: NVIDIA Corporation GA104 [Geforce RTX 3070 Ti Laptop GPU] (rev ff)

USER STORY:

When activating the nvidia drivers version 510.54-7, following the procedure below, the display freezes and the machine stops responding to the keyboard, however it is still possible to ssh into the machine.
All works well when running with nvidia drivers version 470xx-dkms (470.103.01-1).

STEP 1:
$ optimus-manager --status
Optimus Manager (Client) version 1.4
Current GPU mode : integrated
GPU mode requested for next login : no change
GPU at startup : integrated
Temporary config path: no

STEP 2:
$ optimus-manager --switch nvidia
A GPU switch from integrated to nvidia is pending.
Log out and log back in to apply.
ERROR : cannot get current display manager name : No display-manager.service file found
Switching to mode : nvidia
Please logout all graphical sessions then log back in to apply the change.

STEP 3:
$ prime-switch
[7] INFO: # Xorg pre-start hook
[7] INFO: Previous state was: {‘type’: ‘pending_pre_xorg_start’, ‘requested_mode’: ‘nvidia’, ‘current_mode’: ‘integrated’}
[7] INFO: Requested mode is: nvidia
modinfo: ERROR: Module acpi_call not found.
[703] INFO: Available modules: [‘nouveau’, ‘bbswitch’, ‘nvidia’, ‘nvidia_drm’, ‘nvidia_modeset’, ‘nvidia_uvm’]
[703] INFO: Unloading modules [‘nouveau’] (if loaded)
[706] INFO: Loading module bbswitch
[709] INFO: Setting GPU power to ON via bbswitch
[1985] INFO: Loading module nvidia
[2663] INFO: Loading module nvidia_drm

( ** KERNEL OOPS / DISPLAY FREEZES / NO RESPONSE TO KEYBOARD ** )

The dmesg log attached highlights the following lines:

(…)
[ 1197.825369] BUG: kernel NULL pointer dereference, address: 0000000000000048
[ 1197.825370] #PF: supervisor write access in kernel mode
[ 1197.825371] #PF: error_code(0x0002) - not-present page
[ 1197.825372] PGD 0 P4D 0
[ 1197.825374] Oops: 0002 [#1] PREEMPT SMP NOPTI
(…)

Also, please find attached:

  • nvidia-bug-report_before_loading_nvidia_driver.log.gz
  • nvidia-bug-report_after_hang.log.gz

nvidia-bug-report.log.gz (1.1 KB)
dmesg_after_hang.log (173.8 KB)
nvidia-bug-report_before_loading_nvidia_driver.log.gz (68.9 KB)

Regards,
FM

1 Like

Hello @flmagnom and welcome to the NVIDIA developer forums.

Thank you for the detailed report! I will pass it on to our internal Linux team and ask them to check it out and if possible give feedback.

1 Like

I have also tested with the package nvidia-dkms but the outcome is the same.

Cheers,
FM

We have filed a bug 3579627 internally for tracking purpose.
We are trying to reproduce issue locally which will help to debug the problem, shall keep you updated on the same and may get back to you if needed any other information.

1 Like

Hey, have you had a chance to reproduce the issue?

Best regards,
FM

Hi,

I’m seeing the same problem on a desktop machine.
Machine Data:

  • AMD 5950x on MSI X570 Board
  • RTX 3060
  • Arch Linux (Kernel 5.17.1-arch1 / 5.17.2-arch3)
  • NVidia Driver 510.60.02-1
  • Kernel oops occurs ~1-2s after starting the first X Server, problem does not occur when using wayland.

Kernel Oops Log:
[ 219.141466] Oops: 0000 [#1] PREEMPT SMP NOPTI
[ 219.141469] CPU: 2 PID: 385 Comm: nvidia-modeset/ Tainted: P OE 5.17.2-arch3-1 #1 9f63002e2eccd70c4f49bb6b80a0b1362adb0924
[ 219.141472] Hardware name: Micro-Star International Co., Ltd. MS-7D53/MPG X570S EDGE MAX WIFI (MS-7D53), BIOS 1.00 08/17/2021
[ 219.141473] RIP: 0010:_nv015945rm+0x191/0x2d0 [nvidia]
[ 219.141752] Code: 00 41 8b b5 2c 06 00 00 e8 9c 1f fc ff 48 89 c7 48 c7 c6 c0 f8 25 c2 e8 cd 57 51 00 41 83 bf 6c 16 00 00 02 0f 84 16 01 00 00 <44> 8b 88 68 16 00 00 41 ba 01 00 00 00 44 8d 5b 01 89 5d 18 0f b6
[ 219.141754] RSP: 0018:ffffb38d81a1fbd8 EFLAGS: 00010293
[ 219.141756] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000007
[ 219.141757] RDX: 0000000000000008 RSI: 0000000000272cdd RDI: 0000000000000000
[ 219.141758] RBP: ffff93c15e502ba0 R08: 0000000000000000 R09: ffff93c152cad660
[ 219.141759] R10: 0000000000003514 R11: ffff93c15e502bbc R12: ffff93c152c20008
[ 219.141760] R13: ffff93c152ecc008 R14: ffff93c152cac010 R15: ffff93c152cac008
[ 219.141761] FS: 0000000000000000(0000) GS:ffff93dfdea80000(0000) knlGS:0000000000000000
[ 219.141763] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 219.141764] CR2: 0000000000001668 CR3: 0000000166bde000 CR4: 0000000000750ee0
[ 219.141765] PKRU: 55555554
[ 219.141766] Call Trace:
[ 219.141768]
[ 219.141769] ? _nv015848rm+0x4b/0xf0 [nvidia 80d035a25daf365f2e8b526c1f36d6e6613c41ad]
[ 219.142030] ? _nv017799rm+0xea3/0x1930 [nvidia 80d035a25daf365f2e8b526c1f36d6e6613c41ad]
[ 219.142295] ? _nv034289rm+0x174/0x180 [nvidia 80d035a25daf365f2e8b526c1f36d6e6613c41ad]
[ 219.142544] ? _nv017658rm+0xd9/0x170 [nvidia 80d035a25daf365f2e8b526c1f36d6e6613c41ad]
[ 219.142798] ? _nv035933rm+0x265/0x2c0 [nvidia 80d035a25daf365f2e8b526c1f36d6e6613c41ad]
[ 219.143046] ? _nv011414rm+0x4fe/0x620 [nvidia 80d035a25daf365f2e8b526c1f36d6e6613c41ad]
[ 219.143261] ? _nv034423rm+0x53/0xb0 [nvidia 80d035a25daf365f2e8b526c1f36d6e6613c41ad]
[ 219.143466] ? _nv010343rm+0x52/0xa0 [nvidia 80d035a25daf365f2e8b526c1f36d6e6613c41ad]
[ 219.143669] ? _nv010342rm+0x46/0x50 [nvidia 80d035a25daf365f2e8b526c1f36d6e6613c41ad]
[ 219.143871] ? _nv010342rm+0x2f/0x50 [nvidia 80d035a25daf365f2e8b526c1f36d6e6613c41ad]
[ 219.144073] ? rm_kernel_rmapi_op+0x141/0x190 [nvidia 80d035a25daf365f2e8b526c1f36d6e6613c41ad]
[ 219.144299] ? nvkms_call_rm+0x4b/0x80 [nvidia_modeset 02a85f75b46dc7efd0259235ffa1dd819b76279e]
[ 219.144315] ? _nv002519kms+0x51/0x60 [nvidia_modeset 02a85f75b46dc7efd0259235ffa1dd819b76279e]
[ 219.144334] ? _nv001550kms+0x30/0x30 [nvidia_modeset 02a85f75b46dc7efd0259235ffa1dd819b76279e]
[ 219.144353] ? _nv001212kms+0x127/0x3a0 [nvidia_modeset 02a85f75b46dc7efd0259235ffa1dd819b76279e]
[ 219.144377] ? _nv001487kms+0xb7/0xc0 [nvidia_modeset 02a85f75b46dc7efd0259235ffa1dd819b76279e]
[ 219.144398] ? _nv001582kms+0x22/0x40 [nvidia_modeset 02a85f75b46dc7efd0259235ffa1dd819b76279e]
[ 219.144416] ? nvkms_kthread_q_callback+0x9c/0x100 [nvidia_modeset 02a85f75b46dc7efd0259235ffa1dd819b76279e]
[ 219.144431] ? _main_loop+0x9e/0x150 [nvidia_modeset 02a85f75b46dc7efd0259235ffa1dd819b76279e]
[ 219.144445] ? nvkms_sema_up+0x10/0x10 [nvidia_modeset 02a85f75b46dc7efd0259235ffa1dd819b76279e]
[ 219.144460] ? kthread+0xd8/0x100
[ 219.144464] ? kthread_complete_and_exit+0x20/0x20
[ 219.144466] ? ret_from_fork+0x22/0x30
[ 219.144470]
[ 219.144470] Modules linked in: xt_conntrack nft_chain_nat xt_MASQUERADE nf_nat nf_conntrack_netlink nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_addrtype nft_compat nf_tables libcrc32c nfnetlink br_netfilter bridge stp llc overlay ccm snd_seq_dummy snd_hrtimer snd_seq iwlmvm intel_rapl_msr mac80211 intel_rapl_common nct6683 libarc4 iwlwifi btusb snd_hda_codec_hdmi btrtl vfat uvcvideo btbcm wmi_bmof edac_mce_amd fat iwlmei videobuf2_vmalloc snd_hda_intel snd_usb_audio btintel videobuf2_memops snd_intel_dspcfg btmtk videobuf2_v4l2 snd_usbmidi_lib snd_intel_sdw_acpi videobuf2_common snd_hda_codec snd_rawmidi bluetooth kvm videodev snd_hda_core snd_seq_device irqbypass snd_hwdep mousedev mc cfg80211 joydev ecdh_generic rapl r8169 snd_pcm realtek sp5100_tco snd_timer rfkill pcspkr mdio_devres k10temp i2c_piix4 snd mei libphy soundcore wmi tpm_crb tpm_tis tpm_tis_core mac_hid pinctrl_amd acpi_cpufreq usbip_host usbip_core ipmi_devintf ipmi_msghandler i2c_dev sg crypto_user fuse bpf_preload
[ 219.144515] ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd2 hid_jabra usbhid dm_crypt cbc encrypted_keys dm_mod trusted asn1_encoder tee tpm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel crypto_simd ccp nvme cryptd xhci_pci rng_core nvme_core xhci_pci_renesas nvidia_drm(POE) nvidia_uvm(POE) nvidia_modeset(POE) nvidia(POE)
[ 219.144532] CR2: 0000000000001668
[ 219.144533] —[ end trace 0000000000000000 ]—
[ 219.144535] RIP: 0010:_nv015945rm+0x191/0x2d0 [nvidia]
[ 219.144805] Code: 00 41 8b b5 2c 06 00 00 e8 9c 1f fc ff 48 89 c7 48 c7 c6 c0 f8 25 c2 e8 cd 57 51 00 41 83 bf 6c 16 00 00 02 0f 84 16 01 00 00 <44> 8b 88 68 16 00 00 41 ba 01 00 00 00 44 8d 5b 01 89 5d 18 0f b6
[ 219.144806] RSP: 0018:ffffb38d81a1fbd8 EFLAGS: 00010293
[ 219.144808] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000007
[ 219.144809] RDX: 0000000000000008 RSI: 0000000000272cdd RDI: 0000000000000000
[ 219.144810] RBP: ffff93c15e502ba0 R08: 0000000000000000 R09: ffff93c152cad660
[ 219.144810] R10: 0000000000003514 R11: ffff93c15e502bbc R12: ffff93c152c20008
[ 219.144811] R13: ffff93c152ecc008 R14: ffff93c152cac010 R15: ffff93c152cac008
[ 219.144812] FS: 0000000000000000(0000) GS:ffff93dfdea80000(0000) knlGS:0000000000000000
[ 219.144814] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 219.144815] CR2: 0000000000001668 CR3: 0000000166bde000 CR4: 0000000000750ee0
[ 219.144816] PKRU: 55555554

Whole setup works when downgrading to Linux 5.15 and NVidia 495.

Hey, were you able to reproduce the issue?

Kind regards,
FM

Unfortunately engineering could not reproduce the issue so far. We will share any news as soon as we have some.

Hello,

I suffer from the exact same issue! Has a fix been found? Nvidia is unusable for me!

1 Like

Hey folks! I have exactky the same issue with distrubutions derivates from Archlinux all of them!
My video card is a GTX 3070 ti (notebook)
Let me share with you guys an screenshot of the error I got (kernel output)


.
Any clue how to address this issue?

Hello @aezoodsma and @s.dominguez1974 and welcome to the NVIDIA developer forums.

Unfortunately we have trouble getting a repro case for proper debugging of this issue in our labs. It seems this problem is isolated to just a certain number of devices. But it is still being worked.

@s.dominguez1974 In your case I suspect it is a different reason. Looking at your log this looks rather like a problem with trying to use secure boot without using a signed driver module. “Module verification failed”. So either you should try without secure boot or authenticate the driver during installation. I don’t know Arch Linux though, so I am not sure how to achieve that part.

Thanks!

We have been not able to reproduce issue so far, hence looking for more information and isolation results to take it further.

  1. Is it intermittent or seen on every switch.
  2. Please try on latest kernel and drivers as well.
    User flmagnom has outlined repro steps from which we are not able to repro issue, if anyone else has other reliable steps available, please let us know.

We have been not able to reproduce issue so far, hence looking for more information and isolation results to take it further.

  1. Is it intermittent or seen on every switch.
  2. Please try on latest kernel and drivers as well.
    User flmagnom has outlined repro steps from which we are not able to repro issue, if anyone else has other reliable steps available, please let us know.
1 Like

Hello,

Please find a updated attempt with the latest driver and kernel (dmesg attached) :(
This occurs while loading the nvidia kernel module:

$ sudo modprobe nvidia

Linux afro 6.0.2-arch1-1 #1 SMP PREEMPT_DYNAMIC Sat, 15 Oct 2022 14:00:49 +0000 x86_64 GNU/Linux

Name : nvidia
Version : 520.56.06-4
Description : NVIDIA drivers for linux
Architecture : x86_64

OS: Arch Linux x86_64
Host: Stealth GS66 12UGS REV:1.0
Kernel: 6.0.2-arch1-1
Uptime: 10 mins
Packages: 1964 (pacman)
Shell: zsh 5.9
Resolution: 1920x1080
Terminal: zellij
CPU: 12th Gen Intel i7-12700H (20) @ 4.600GHz
GPU: Intel Alder Lake-P
GPU: NVIDIA Geforce RTX 3070 Ti Laptop GPU
Memory: 503MiB / 31802MiB

dmesg.log (100.7 KB)