Nvidia driver kernel random call trace

When I open an app (mainly firefox) using prime-run on my rtx3050 + iris xe optimus laptop it works fine but there are some seemingly random moments when the app hangs for a second and when that happens I see a warning call trace regarding nv_queue a couple of times in dmesg like this:

[ 2016.890260] ------------[ cut here ]------------ [ 2016.890266] WARNING: CPU: 1 PID: 267 at include/linux/rwsem.h:80 0xffffffff81262525 [ 2016.890273] Modules linked in: nvidia_uvm(POE) vmnet(OE) vmw_vsock_vmci_transport vmw_vmci vmmon(OE) nvidia_drm(POE) nvidia_modeset(POE) nvidia(POE) [ 2016.890288] CPU: 1 PID: 267 Comm: nv_queue Tainted: P W OE 6.10.3polq12kerneltkg #1 [ 2016.890293] Hardware name: LENOVO 82K1/LNVNB161216, BIOS H4CN36WW(V2.05) 05/17/2024 [ 2016.890296] RIP: 0010:0xffffffff81262525 [ 2016.890299] Code: 24 f7 c1 01 01 00 00 74 07 48 89 03 31 c0 eb 12 49 8b 3e e8 5d ee c4 00 e8 98 7e ed ff b8 ea ff ff ff 48 83 c4 08 5b 41 5e c3 <0f> 0b e9 e9 fe ff ff cc cc cc cc 55 41 57 41 56 41 55 41 54 53 48 [ 2016.890303] RSP: 0018:ffffc90000ff7bc0 EFLAGS: 00010246 [ 2016.890307] RAX: 2b3694fd0ab49100 RBX: ffffc90000ff7c20 RCX: ffffc90000ff7bd0 [ 2016.890309] RDX: ffffc90000ff7bd8 RSI: 00007f94f8806000 RDI: ffff88810004b700 [ 2016.890311] RBP: 0000000000000000 R08: ffff888152171150 R09: 0000000000000020 [ 2016.890314] R10: ffff888104caaf2c R11: ffff88849fa66300 R12: 00007f94f8806000 [ 2016.890316] R13: 0000000000000000 R14: 0000000000000000 R15: ffff888152171150 [ 2016.890318] FS: 0000000000000000(0000) GS:ffff88849fa40000(0000) knlGS:0000000000000000 [ 2016.890321] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 2016.890324] CR2: 00007f4a1fea0000 CR3: 0000000002a09002 CR4: 0000000000f70ef0 [ 2016.890327] PKRU: 55555554 [ 2016.890328] Call Trace: [ 2016.890332] <TASK> [ 2016.890335] ? 0xffffffff810db590 [ 2016.890339] ? 0xffffffff81e7cbf1 [ 2016.890340] ? 0xffffffff81262525 [ 2016.890342] ? 0xffffffff81ea509d [ 2016.890343] ? 0xffffffff81ea5036 [ 2016.890344] ? 0xffffffff82000a56 [ 2016.890346] ? 0xffffffff81262525 [ 2016.890348] 0xffffffff810c5429 [ 2016.890350] 0xffffffff810c5a2c [ 2016.890352] 0xffffffff8125d143 [ 2016.890354] 0xffffffff8125eec1 [ 2016.890356] 0xffffffff8125f038 [ 2016.890359] 0xffffffffa0e2d90b [ 2016.890362] 0xffffffffa0d24365 [ 2016.890365] ? 0xffffffffa0e36260 [ 2016.890366] 0xffffffffa0d18683 [ 2016.890370] 0xffffffffa0e33ec9 [ 2016.890372] 0xffffffffa0e36310 [ 2016.890374] 0xffffffff811040db [ 2016.890375] ? 0xffffffff81104000 [ 2016.890377] 0xffffffff81090b7b [ 2016.890378] ? 0xffffffff81104000 [ 2016.890380] 0xffffffff81000381 [ 2016.890382] </TASK> [ 2016.890383] ---[ end trace 0000000000000000 ]--- [ 2016.890410] ------------[ cut here ]------------ [ 2016.890411] WARNING: CPU: 1 PID: 267 at include/linux/rwsem.h:80 0xffffffff81262525 [ 2016.890414] Modules linked in: nvidia_uvm(POE) vmnet(OE) vmw_vsock_vmci_transport vmw_vmci vmmon(OE) nvidia_drm(POE) nvidia_modeset(POE) nvidia(POE) [ 2016.890424] CPU: 1 PID: 267 Comm: nv_queue Tainted: P W OE 6.10.3polq12kerneltkg #1 [ 2016.890428] Hardware name: LENOVO 82K1/LNVNB161216, BIOS H4CN36WW(V2.05) 05/17/2024 [ 2016.890429] RIP: 0010:0xffffffff81262525 [ 2016.890431] Code: 24 f7 c1 01 01 00 00 74 07 48 89 03 31 c0 eb 12 49 8b 3e e8 5d ee c4 00 e8 98 7e ed ff b8 ea ff ff ff 48 83 c4 08 5b 41 5e c3 <0f> 0b e9 e9 fe ff ff cc cc cc cc 55 41 57 41 56 41 55 41 54 53 48 [ 2016.890434] RSP: 0018:ffffc90000ff7bc0 EFLAGS: 00010246 [ 2016.890436] RAX: 2b3694fd0ab49100 RBX: ffffc90000ff7c20 RCX: ffffc90000ff7bd0 [ 2016.890438] RDX: ffffc90000ff7bd8 RSI: 00007f9502767000 RDI: ffff88810004b700 [ 2016.890441] RBP: 0000000000000000 R08: ffff888173a857e0 R09: 0000000000000000 [ 2016.890443] R10: ffff88815896f1c0 R11: ffffc90000ff7c90 R12: 00007f9502767000 [ 2016.890445] R13: 0000000000000000 R14: 0000000000000000 R15: ffff888173a857e0 [ 2016.890446] FS: 0000000000000000(0000) GS:ffff88849fa40000(0000) knlGS:0000000000000000 [ 2016.890449] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 2016.890451] CR2: 00007f4a1fea0000 CR3: 0000000002a09002 CR4: 0000000000f70ef0 [ 2016.890453] PKRU: 55555554 [ 2016.890454] Call Trace: [ 2016.890455] <TASK> [ 2016.890456] ? 0xffffffff810db590 [ 2016.890458] ? 0xffffffff81e7cbf1 [ 2016.890459] ? 0xffffffff81262525 [ 2016.890461] ? 0xffffffff81ea509d [ 2016.890462] ? 0xffffffff81ea5036 [ 2016.890464] ? 0xffffffff82000a56 [ 2016.890465] ? 0xffffffff81262525 [ 2016.890467] 0xffffffff810c5429 [ 2016.890469] 0xffffffff810c5a2c [ 2016.890471] 0xffffffff8125d143 [ 2016.890473] 0xffffffff8125eec1 [ 2016.890474] 0xffffffff8125f038 [ 2016.890476] 0xffffffffa0e2d90b [ 2016.890478] 0xffffffffa0d24365 [ 2016.890480] ? 0xffffffffa0e36260 [ 2016.890481] 0xffffffffa0d18683 [ 2016.890484] 0xffffffffa0e33ec9 [ 2016.890485] 0xffffffffa0e36310 [ 2016.890487] 0xffffffff811040db [ 2016.890489] ? 0xffffffff81104000 [ 2016.890490] 0xffffffff81090b7b [ 2016.890491] ? 0xffffffff81104000 [ 2016.890493] 0xffffffff81000381 [ 2016.890495] </TASK> [ 2016.890496] ---[ end trace 0000000000000000 ]--- [ 2016.890505] ------------[ cut here ]------------ [ 2016.890506] WARNING: CPU: 1 PID: 267 at include/linux/rwsem.h:80 0xffffffff81262525 [ 2016.890509] Modules linked in: nvidia_uvm(POE) vmnet(OE) vmw_vsock_vmci_transport vmw_vmci vmmon(OE) nvidia_drm(POE) nvidia_modeset(POE) nvidia(POE) [ 2016.890518] CPU: 1 PID: 267 Comm: nv_queue Tainted: P W OE 6.10.3polq12kerneltkg #1 [ 2016.890521] Hardware name: LENOVO 82K1/LNVNB161216, BIOS H4CN36WW(V2.05) 05/17/2024 [ 2016.890522] RIP: 0010:0xffffffff81262525 [ 2016.890523] Code: 24 f7 c1 01 01 00 00 74 07 48 89 03 31 c0 eb 12 49 8b 3e e8 5d ee c4 00 e8 98 7e ed ff b8 ea ff ff ff 48 83 c4 08 5b 41 5e c3 <0f> 0b e9 e9 fe ff ff cc cc cc cc 55 41 57 41 56 41 55 41 54 53 48 [ 2016.890526] RSP: 0018:ffffc90000ff7bc0 EFLAGS: 00010246 [ 2016.890528] RAX: 2b3694fd0ab49100 RBX: ffffc90000ff7c20 RCX: ffffc90000ff7bd0 [ 2016.890530] RDX: ffffc90000ff7bd8 RSI: 00007f94f2c08000 RDI: ffff88810004b700 [ 2016.890532] RBP: 0000000000000000 R08: ffff88817d42e3f0 R09: 0000000000000000 [ 2016.890534] R10: ffff88815896f000 R11: ffffc90000ff7c90 R12: 00007f94f2c08000 [ 2016.890535] R13: 0000000000000000 R14: 0000000000000000 R15: ffff88817d42e3f0 [ 2016.890537] FS: 0000000000000000(0000) GS:ffff88849fa40000(0000) knlGS:0000000000000000 [ 2016.890539] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 2016.890541] CR2: 00007f4a1fea0000 CR3: 0000000002a09002 CR4: 0000000000f70ef0 [ 2016.890542] PKRU: 55555554 [ 2016.890543] Call Trace: [ 2016.890544] <TASK> [ 2016.890546] ? 0xffffffff810db590 [ 2016.890547] ? 0xffffffff81e7cbf1 [ 2016.890548] ? 0xffffffff81262525 [ 2016.890550] ? 0xffffffff81ea509d [ 2016.890551] ? 0xffffffff81ea5036 [ 2016.890552] ? 0xffffffff82000a56 [ 2016.890554] ? 0xffffffff81262525 [ 2016.890555] 0xffffffff810c5429 [ 2016.890557] 0xffffffff810c5a2c [ 2016.890559] 0xffffffff8125d143 [ 2016.890561] 0xffffffff8125eec1 [ 2016.890563] 0xffffffff8125f038 [ 2016.890565] 0xffffffffa0e2d90b [ 2016.890566] 0xffffffffa0d24365 [ 2016.890568] ? 0xffffffffa0e36260 [ 2016.890569] 0xffffffffa0d18683 [ 2016.890572] 0xffffffffa0e33ec9 [ 2016.890573] 0xffffffffa0e36310 [ 2016.890575] 0xffffffff811040db [ 2016.890577] ? 0xffffffff81104000 [ 2016.890578] 0xffffffff81090b7b [ 2016.890579] ? 0xffffffff81104000 [ 2016.890581] 0xffffffff81000381 [ 2016.890583] </TASK> [ 2016.890584] ---[ end trace 0000000000000000 ]--- [ 2016.890596] ------------[ cut here ]------------ [ 2016.890597] WARNING: CPU: 1 PID: 267 at include/linux/rwsem.h:80 0xffffffff81262525 [ 2016.890599] Modules linked in: nvidia_uvm(POE) vmnet(OE) vmw_vsock_vmci_transport vmw_vmci vmmon(OE) nvidia_drm(POE) nvidia_modeset(POE) nvidia(POE) [ 2016.890608] CPU: 1 PID: 267 Comm: nv_queue Tainted: P W OE 6.10.3polq12kerneltkg #1 [ 2016.890610] Hardware name: LENOVO 82K1/LNVNB161216, BIOS H4CN36WW(V2.05) 05/17/2024 [ 2016.890611] RIP: 0010:0xffffffff81262525 [ 2016.890613] Code: 24 f7 c1 01 01 00 00 74 07 48 89 03 31 c0 eb 12 49 8b 3e e8 5d ee c4 00 e8 98 7e ed ff b8 ea ff ff ff 48 83 c4 08 5b 41 5e c3 <0f> 0b e9 e9 fe ff ff cc cc cc cc 55 41 57 41 56 41 55 41 54 53 48 [ 2016.890615] RSP: 0018:ffffc90000ff7bc0 EFLAGS: 00010246 [ 2016.890617] RAX: 2b3694fd0ab49100 RBX: ffffc90000ff7c20 RCX: ffffc90000ff7bd0 [ 2016.890619] RDX: ffffc90000ff7bd8 RSI: 00007f94fee0b000 RDI: ffff88810004b700 [ 2016.890620] RBP: 0000000000000000 R08: ffff88813882a888 R09: 0000000000000000 [ 2016.890622] R10: ffff88815896f180 R11: ffffc90000ff7c90 R12: 00007f94fee0b000 [ 2016.890624] R13: 0000000000000000 R14: 0000000000000000 R15: ffff88813882a888 [ 2016.890625] FS: 0000000000000000(0000) GS:ffff88849fa40000(0000) knlGS:0000000000000000 [ 2016.890628] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 2016.890629] CR2: 00007f4a1fea0000 CR3: 0000000002a09002 CR4: 0000000000f70ef0 [ 2016.890631] PKRU: 55555554 [ 2016.890632] Call Trace: [ 2016.890633] <TASK> [ 2016.890634] ? 0xffffffff810db590 [ 2016.890636] ? 0xffffffff81e7cbf1 [ 2016.890637] ? 0xffffffff81262525 [ 2016.890638] ? 0xffffffff81ea509d [ 2016.890640] ? 0xffffffff81ea5036 [ 2016.890641] ? 0xffffffff82000a56 [ 2016.890643] ? 0xffffffff81262525 [ 2016.890644] 0xffffffff810c5429 [ 2016.890646] 0xffffffff810c5a2c [ 2016.890648] 0xffffffff8125d143 [ 2016.890650] 0xffffffff8125eec1 [ 2016.890652] 0xffffffff8125f038 [ 2016.890653] 0xffffffffa0e2d90b [ 2016.890655] 0xffffffffa0d24365 [ 2016.890656] ? 0xffffffffa0e36260 [ 2016.890658] 0xffffffffa0d18683 [ 2016.890660] 0xffffffffa0e33ec9 [ 2016.890662] 0xffffffffa0e36310 [ 2016.890663] 0xffffffff811040db [ 2016.890665] ? 0xffffffff81104000 [ 2016.890666] 0xffffffff81090b7b [ 2016.890667] ? 0xffffffff81104000 [ 2016.890669] 0xffffffff81000381 [ 2016.890671] </TASK> [ 2016.890672] ---[ end trace 0000000000000000 ]---
I tried 550 and 555 disabling gsp firmware, disabling -flto and -O3 on my make.conf (I use gentoo). Nothing worked.
It is not a serious issue but it is there. I use linux-tkg but none of the patches modify rwsem.h so I don’t think that is the issue.
nvidia-bug-report.log.gz (627.1 KB)

1 Like

I think it’s a linux kernel bug and no a nvidia driver problem.

what you described is indicative of what happens when compiling firefox with -03 in my experience, that makes firefox unstable, -02 should be the safe settings in most use cases, and perhaps -03 in limited cases such as non-graphical programs, but I couldn’t advise you in that case.

Firefox automatically compiles itself with -O3. Even on binary distros liike arch you can go to about:buildconfig and see that it is compiled with -O3.

Programs become unstable in the context of the broader system they operate within, so for example I typically use high performance tweaks even for general desktop usage, and so programs can become unstable simply because of that context, and stutter, fail to load, crash, display artifacts, even weird behavior like in gaming.

The firefox I’m using now does not use -03 but -02, and in my experience, when I tried to compile it in the past, on a funtoo system with -03, because I thought hey, I want it to be faster, firefox became unstable just as the OP is describing.

I don’t think it’s appropriate for apps to default to -03 based on my experience, because they will become unstable, and unreliable, which isn’t always obvious. But, try running many tabs at the same time, multiple instances, like you might infrequently when using firefox, or cranking up the volume on the underlying systems performance, and watch firefox start to suffer.

this is really neat though, thanks for that piece of info…

about:buildconfig

I have this exact error.
mostly the same hardware too.
I only get this error when I’m running in hybrid mode or dedicated Nvidia mode.
If I’m running in AMD mode I don’t get this error.

Also got a similiar problem on an ASUS TUF GAMING A16 (FA607PV), which has R9 7940HX and RTX 4060 Laptop GPU.
The dmesg log:

[ 42.046694] ------------[ cut here ]------------
[ 42.046694] WARNING: CPU: 8 PID: 1269 at include/linux/rwsem.h:80 follow_pte+0x1de/0x200
[ 42.046696] Modules linked in: ccm snd_seq_dummy rfcomm snd_hrtimer snd_seq snd_seq_device overlay cmac algif_hash algif_skcipher af_alg bnep nvidia_drm(OE) nvidia_modeset(OE) vfat fat snd_hda_codec_realtek snd_hda_codec_generic snd_hda_scodec_component snd_hda_codec_hdmi snd_sof_amd_acp63 snd_sof_amd_vangogh amd_atl snd_sof_amd_rembrandt intel_rapl_msr snd_sof_amd_renoir intel_rapl_common joydev mousedev snd_sof_amd_acp snd_sof_pci snd_sof_xtensa_dsp snd_sof snd_sof_utils snd_pci_ps snd_amd_sdw_acpi soundwire_amd soundwire_generic_allocation soundwire_bus snd_soc_core snd_hda_intel snd_intel_dspcfg snd_compress mt7921e snd_intel_sdw_acpi ac97_bus mt7921_common snd_pcm_dmaengine uvcvideo snd_hda_codec mt792x_lib snd_rpl_pci_acp6x snd_acp_pci videobuf2_vmalloc mt76_connac_lib snd_hda_core snd_acp_legacy_common uvc mt76 btusb snd_pci_acp6x videobuf2_memops snd_hwdep kvm_amd videobuf2_v4l2 btrtl snd_pci_acp5x snd_pcm btintel snd_rn_pci_acp3x mac80211 videodev r8169 snd_timer ucsi_acpi btbcm snd_acp_config hid_multitouch
[ 42.046727] realtek kvm typec_ucsi sp5100_tco snd videobuf2_common btmtk libarc4 mdio_devres snd_soc_acpi hid_generic bluetooth mc asus_nb_wmi wmi_bmof cfg80211 rapl nvidia_wmi_ec_backlight typec pcspkr libphy soundcore snd_pci_acp3x k10temp i2c_piix4 i2c_hid_acpi roles i2c_hid amd_pmc nvidia_uvm(OE) mac_hid nvidia(OE) crypto_user loop nfnetlink ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd2 dm_crypt cbc encrypted_keys trusted asn1_encoder tee dm_mod amdgpu crct10dif_pclmul crc32_pclmul crc32c_intel polyval_clmulni amdxcp polyval_generic i2c_algo_bit gf128mul drm_ttm_helper ghash_clmulni_intel ttm sha512_ssse3 drm_exec sha256_ssse3 gpu_sched sha1_ssse3 drm_suballoc_helper aesni_intel nvme drm_buddy crypto_simd drm_display_helper nvme_core cryptd xhci_pci ccp cec xhci_pci_renesas nvme_auth serio_raw atkbd libps2 vivaldi_fmap hid_asus asus_wmi i8042 platform_profile usbhid serio asus_wmi_sensors asus_wireless sparse_keymap rfkill video wmi
[ 42.046766] CPU: 8 PID: 1269 Comm: nv_queue Tainted: G W OE 6.10.3-arch1-2 #1 20bffa7dc84b9a89fd543afbd712f49dca71b693
[ 42.046767] Hardware name: ASUSTeK COMPUTER INC. ASUS TUF Gaming A16 FA607PV_FA607PV/FA607PV, BIOS FA607PV.307 03/27/2024
[ 42.046768] RIP: 0010:follow_pte+0x1de/0x200
[ 42.046769] Code: 14 da 00 48 81 e2 00 00 00 c0 48 09 c2 48 f7 d2 48 85 fa 75 20 e8 b2 f5 ff ff 48 8b 35 6b e3 5c 01 48 81 e6 00 00 00 c0 eb 8d <0f> 0b 48 3b 1f 0f 83 50 fe ff ff bd ea ff ff ff eb b6 49 8b 3c 24
[ 42.046770] RSP: 0018:ffffb53881bc7b48 EFLAGS: 00010246
[ 42.046771] RAX: 0000000000000000 RBX: 00007e4869430000 RCX: ffffb53881bc7b88
[ 42.046772] RDX: ffffb53881bc7b80 RSI: 00007e4869430000 RDI: ffffa0571dae0da8
[ 42.046773] RBP: ffffb53881bc7bc8 R08: ffffb53881bc7d20 R09: 0000000000000000
[ 42.046774] R10: 0000000000000200 R11: 0000000000000000 R12: ffffb53881bc7b88
[ 42.046774] R13: ffffb53881bc7b80 R14: ffffa056c006c200 R15: 0000000000000000
[ 42.046775] FS: 0000000000000000(0000) GS:ffffa0661d200000(0000) knlGS:0000000000000000
[ 42.046776] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 42.046777] CR2: 0000725e8f854bb0 CR3: 00000007f4020000 CR4: 0000000000f50ef0
[ 42.046778] PKRU: 55555554
[ 42.046778] Call Trace:
[ 42.046779]
[ 42.046779] ? follow_pte+0x1de/0x200
[ 42.046781] ? __warn.cold+0x8e/0xe8
[ 42.046782] ? follow_pte+0x1de/0x200
[ 42.046784] ? report_bug+0xff/0x140
[ 42.046786] ? handle_bug+0x3c/0x80
[ 42.046787] ? exc_invalid_op+0x17/0x70
[ 42.046788] ? asm_exc_invalid_op+0x1a/0x20
[ 42.046791] ? follow_pte+0x1de/0x200
[ 42.046793] follow_phys+0x49/0x110
[ 42.046796] untrack_pfn+0x55/0x120
[ 42.046797] unmap_single_vma+0xa6/0xe0
[ 42.046800] zap_page_range_single+0x122/0x1d0
[ 42.046804] unmap_mapping_range+0x116/0x140
[ 42.046806] ? __pfx__main_loop+0x10/0x10 [nvidia ba802dc980f2ebec319fb090d3412a969b6d9416]
[ 42.046855] nv_revoke_gpu_mappings+0x67/0xb0 [nvidia ba802dc980f2ebec319fb090d3412a969b6d9416]
[ 42.046899] RmHandleIdleSustained+0x3b/0x140 [nvidia ba802dc980f2ebec319fb090d3412a969b6d9416]
[ 42.046961] ? gpumgrGetGpu+0x69/0xa0 [nvidia ba802dc980f2ebec319fb090d3412a969b6d9416]
[ 42.047041] rm_execute_work_item+0xda/0x150 [nvidia ba802dc980f2ebec319fb090d3412a969b6d9416]
[ 42.047111] _main_loop+0x95/0x150 [nvidia ba802dc980f2ebec319fb090d3412a969b6d9416]
[ 42.047161] kthread+0xcf/0x100
[ 42.047163] ? __pfx_kthread+0x10/0x10
[ 42.047165] ret_from_fork+0x31/0x50
[ 42.047166] ? __pfx_kthread+0x10/0x10
[ 42.047168] ret_from_fork_asm+0x1a/0x30
[ 42.047171]
[ 42.047172] —[ end trace 0000000000000000 ]—

The nvidia bug report is at here:
nvidia-bug-report.log.gz (923.9 KB)

I’ve tried linux-lts in archlinux, and it seems that linux 6.6.45-1-lts works well with nvidia 560 beta and I experienced no call traces in the dmesg
Maybe the driver doesn’t have full support for newer versions of linux kernel?

We have seen similar call trace internally and currently investigating the issue.
Shall update once there is further feedback from engineering team.

I’m using Ubuntu 24.04 LTS that comes with kernel 6.8 with driver 550 and everything works. I don’t know if this information is useful since the original 6.8 kernel series is outdated. If I change to a more recent 6.10 kernel I got the crashes.