Nvidia driver kernel random call trace

When I open an app (mainly firefox) using prime-run on my rtx3050 + iris xe optimus laptop it works fine but there are some seemingly random moments when the app hangs for a second and when that happens I see a warning call trace regarding nv_queue a couple of times in dmesg like this:

[ 2016.890260] ------------[ cut here ]------------ [ 2016.890266] WARNING: CPU: 1 PID: 267 at include/linux/rwsem.h:80 0xffffffff81262525 [ 2016.890273] Modules linked in: nvidia_uvm(POE) vmnet(OE) vmw_vsock_vmci_transport vmw_vmci vmmon(OE) nvidia_drm(POE) nvidia_modeset(POE) nvidia(POE) [ 2016.890288] CPU: 1 PID: 267 Comm: nv_queue Tainted: P W OE 6.10.3polq12kerneltkg #1 [ 2016.890293] Hardware name: LENOVO 82K1/LNVNB161216, BIOS H4CN36WW(V2.05) 05/17/2024 [ 2016.890296] RIP: 0010:0xffffffff81262525 [ 2016.890299] Code: 24 f7 c1 01 01 00 00 74 07 48 89 03 31 c0 eb 12 49 8b 3e e8 5d ee c4 00 e8 98 7e ed ff b8 ea ff ff ff 48 83 c4 08 5b 41 5e c3 <0f> 0b e9 e9 fe ff ff cc cc cc cc 55 41 57 41 56 41 55 41 54 53 48 [ 2016.890303] RSP: 0018:ffffc90000ff7bc0 EFLAGS: 00010246 [ 2016.890307] RAX: 2b3694fd0ab49100 RBX: ffffc90000ff7c20 RCX: ffffc90000ff7bd0 [ 2016.890309] RDX: ffffc90000ff7bd8 RSI: 00007f94f8806000 RDI: ffff88810004b700 [ 2016.890311] RBP: 0000000000000000 R08: ffff888152171150 R09: 0000000000000020 [ 2016.890314] R10: ffff888104caaf2c R11: ffff88849fa66300 R12: 00007f94f8806000 [ 2016.890316] R13: 0000000000000000 R14: 0000000000000000 R15: ffff888152171150 [ 2016.890318] FS: 0000000000000000(0000) GS:ffff88849fa40000(0000) knlGS:0000000000000000 [ 2016.890321] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 2016.890324] CR2: 00007f4a1fea0000 CR3: 0000000002a09002 CR4: 0000000000f70ef0 [ 2016.890327] PKRU: 55555554 [ 2016.890328] Call Trace: [ 2016.890332] <TASK> [ 2016.890335] ? 0xffffffff810db590 [ 2016.890339] ? 0xffffffff81e7cbf1 [ 2016.890340] ? 0xffffffff81262525 [ 2016.890342] ? 0xffffffff81ea509d [ 2016.890343] ? 0xffffffff81ea5036 [ 2016.890344] ? 0xffffffff82000a56 [ 2016.890346] ? 0xffffffff81262525 [ 2016.890348] 0xffffffff810c5429 [ 2016.890350] 0xffffffff810c5a2c [ 2016.890352] 0xffffffff8125d143 [ 2016.890354] 0xffffffff8125eec1 [ 2016.890356] 0xffffffff8125f038 [ 2016.890359] 0xffffffffa0e2d90b [ 2016.890362] 0xffffffffa0d24365 [ 2016.890365] ? 0xffffffffa0e36260 [ 2016.890366] 0xffffffffa0d18683 [ 2016.890370] 0xffffffffa0e33ec9 [ 2016.890372] 0xffffffffa0e36310 [ 2016.890374] 0xffffffff811040db [ 2016.890375] ? 0xffffffff81104000 [ 2016.890377] 0xffffffff81090b7b [ 2016.890378] ? 0xffffffff81104000 [ 2016.890380] 0xffffffff81000381 [ 2016.890382] </TASK> [ 2016.890383] ---[ end trace 0000000000000000 ]--- [ 2016.890410] ------------[ cut here ]------------ [ 2016.890411] WARNING: CPU: 1 PID: 267 at include/linux/rwsem.h:80 0xffffffff81262525 [ 2016.890414] Modules linked in: nvidia_uvm(POE) vmnet(OE) vmw_vsock_vmci_transport vmw_vmci vmmon(OE) nvidia_drm(POE) nvidia_modeset(POE) nvidia(POE) [ 2016.890424] CPU: 1 PID: 267 Comm: nv_queue Tainted: P W OE 6.10.3polq12kerneltkg #1 [ 2016.890428] Hardware name: LENOVO 82K1/LNVNB161216, BIOS H4CN36WW(V2.05) 05/17/2024 [ 2016.890429] RIP: 0010:0xffffffff81262525 [ 2016.890431] Code: 24 f7 c1 01 01 00 00 74 07 48 89 03 31 c0 eb 12 49 8b 3e e8 5d ee c4 00 e8 98 7e ed ff b8 ea ff ff ff 48 83 c4 08 5b 41 5e c3 <0f> 0b e9 e9 fe ff ff cc cc cc cc 55 41 57 41 56 41 55 41 54 53 48 [ 2016.890434] RSP: 0018:ffffc90000ff7bc0 EFLAGS: 00010246 [ 2016.890436] RAX: 2b3694fd0ab49100 RBX: ffffc90000ff7c20 RCX: ffffc90000ff7bd0 [ 2016.890438] RDX: ffffc90000ff7bd8 RSI: 00007f9502767000 RDI: ffff88810004b700 [ 2016.890441] RBP: 0000000000000000 R08: ffff888173a857e0 R09: 0000000000000000 [ 2016.890443] R10: ffff88815896f1c0 R11: ffffc90000ff7c90 R12: 00007f9502767000 [ 2016.890445] R13: 0000000000000000 R14: 0000000000000000 R15: ffff888173a857e0 [ 2016.890446] FS: 0000000000000000(0000) GS:ffff88849fa40000(0000) knlGS:0000000000000000 [ 2016.890449] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 2016.890451] CR2: 00007f4a1fea0000 CR3: 0000000002a09002 CR4: 0000000000f70ef0 [ 2016.890453] PKRU: 55555554 [ 2016.890454] Call Trace: [ 2016.890455] <TASK> [ 2016.890456] ? 0xffffffff810db590 [ 2016.890458] ? 0xffffffff81e7cbf1 [ 2016.890459] ? 0xffffffff81262525 [ 2016.890461] ? 0xffffffff81ea509d [ 2016.890462] ? 0xffffffff81ea5036 [ 2016.890464] ? 0xffffffff82000a56 [ 2016.890465] ? 0xffffffff81262525 [ 2016.890467] 0xffffffff810c5429 [ 2016.890469] 0xffffffff810c5a2c [ 2016.890471] 0xffffffff8125d143 [ 2016.890473] 0xffffffff8125eec1 [ 2016.890474] 0xffffffff8125f038 [ 2016.890476] 0xffffffffa0e2d90b [ 2016.890478] 0xffffffffa0d24365 [ 2016.890480] ? 0xffffffffa0e36260 [ 2016.890481] 0xffffffffa0d18683 [ 2016.890484] 0xffffffffa0e33ec9 [ 2016.890485] 0xffffffffa0e36310 [ 2016.890487] 0xffffffff811040db [ 2016.890489] ? 0xffffffff81104000 [ 2016.890490] 0xffffffff81090b7b [ 2016.890491] ? 0xffffffff81104000 [ 2016.890493] 0xffffffff81000381 [ 2016.890495] </TASK> [ 2016.890496] ---[ end trace 0000000000000000 ]--- [ 2016.890505] ------------[ cut here ]------------ [ 2016.890506] WARNING: CPU: 1 PID: 267 at include/linux/rwsem.h:80 0xffffffff81262525 [ 2016.890509] Modules linked in: nvidia_uvm(POE) vmnet(OE) vmw_vsock_vmci_transport vmw_vmci vmmon(OE) nvidia_drm(POE) nvidia_modeset(POE) nvidia(POE) [ 2016.890518] CPU: 1 PID: 267 Comm: nv_queue Tainted: P W OE 6.10.3polq12kerneltkg #1 [ 2016.890521] Hardware name: LENOVO 82K1/LNVNB161216, BIOS H4CN36WW(V2.05) 05/17/2024 [ 2016.890522] RIP: 0010:0xffffffff81262525 [ 2016.890523] Code: 24 f7 c1 01 01 00 00 74 07 48 89 03 31 c0 eb 12 49 8b 3e e8 5d ee c4 00 e8 98 7e ed ff b8 ea ff ff ff 48 83 c4 08 5b 41 5e c3 <0f> 0b e9 e9 fe ff ff cc cc cc cc 55 41 57 41 56 41 55 41 54 53 48 [ 2016.890526] RSP: 0018:ffffc90000ff7bc0 EFLAGS: 00010246 [ 2016.890528] RAX: 2b3694fd0ab49100 RBX: ffffc90000ff7c20 RCX: ffffc90000ff7bd0 [ 2016.890530] RDX: ffffc90000ff7bd8 RSI: 00007f94f2c08000 RDI: ffff88810004b700 [ 2016.890532] RBP: 0000000000000000 R08: ffff88817d42e3f0 R09: 0000000000000000 [ 2016.890534] R10: ffff88815896f000 R11: ffffc90000ff7c90 R12: 00007f94f2c08000 [ 2016.890535] R13: 0000000000000000 R14: 0000000000000000 R15: ffff88817d42e3f0 [ 2016.890537] FS: 0000000000000000(0000) GS:ffff88849fa40000(0000) knlGS:0000000000000000 [ 2016.890539] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 2016.890541] CR2: 00007f4a1fea0000 CR3: 0000000002a09002 CR4: 0000000000f70ef0 [ 2016.890542] PKRU: 55555554 [ 2016.890543] Call Trace: [ 2016.890544] <TASK> [ 2016.890546] ? 0xffffffff810db590 [ 2016.890547] ? 0xffffffff81e7cbf1 [ 2016.890548] ? 0xffffffff81262525 [ 2016.890550] ? 0xffffffff81ea509d [ 2016.890551] ? 0xffffffff81ea5036 [ 2016.890552] ? 0xffffffff82000a56 [ 2016.890554] ? 0xffffffff81262525 [ 2016.890555] 0xffffffff810c5429 [ 2016.890557] 0xffffffff810c5a2c [ 2016.890559] 0xffffffff8125d143 [ 2016.890561] 0xffffffff8125eec1 [ 2016.890563] 0xffffffff8125f038 [ 2016.890565] 0xffffffffa0e2d90b [ 2016.890566] 0xffffffffa0d24365 [ 2016.890568] ? 0xffffffffa0e36260 [ 2016.890569] 0xffffffffa0d18683 [ 2016.890572] 0xffffffffa0e33ec9 [ 2016.890573] 0xffffffffa0e36310 [ 2016.890575] 0xffffffff811040db [ 2016.890577] ? 0xffffffff81104000 [ 2016.890578] 0xffffffff81090b7b [ 2016.890579] ? 0xffffffff81104000 [ 2016.890581] 0xffffffff81000381 [ 2016.890583] </TASK> [ 2016.890584] ---[ end trace 0000000000000000 ]--- [ 2016.890596] ------------[ cut here ]------------ [ 2016.890597] WARNING: CPU: 1 PID: 267 at include/linux/rwsem.h:80 0xffffffff81262525 [ 2016.890599] Modules linked in: nvidia_uvm(POE) vmnet(OE) vmw_vsock_vmci_transport vmw_vmci vmmon(OE) nvidia_drm(POE) nvidia_modeset(POE) nvidia(POE) [ 2016.890608] CPU: 1 PID: 267 Comm: nv_queue Tainted: P W OE 6.10.3polq12kerneltkg #1 [ 2016.890610] Hardware name: LENOVO 82K1/LNVNB161216, BIOS H4CN36WW(V2.05) 05/17/2024 [ 2016.890611] RIP: 0010:0xffffffff81262525 [ 2016.890613] Code: 24 f7 c1 01 01 00 00 74 07 48 89 03 31 c0 eb 12 49 8b 3e e8 5d ee c4 00 e8 98 7e ed ff b8 ea ff ff ff 48 83 c4 08 5b 41 5e c3 <0f> 0b e9 e9 fe ff ff cc cc cc cc 55 41 57 41 56 41 55 41 54 53 48 [ 2016.890615] RSP: 0018:ffffc90000ff7bc0 EFLAGS: 00010246 [ 2016.890617] RAX: 2b3694fd0ab49100 RBX: ffffc90000ff7c20 RCX: ffffc90000ff7bd0 [ 2016.890619] RDX: ffffc90000ff7bd8 RSI: 00007f94fee0b000 RDI: ffff88810004b700 [ 2016.890620] RBP: 0000000000000000 R08: ffff88813882a888 R09: 0000000000000000 [ 2016.890622] R10: ffff88815896f180 R11: ffffc90000ff7c90 R12: 00007f94fee0b000 [ 2016.890624] R13: 0000000000000000 R14: 0000000000000000 R15: ffff88813882a888 [ 2016.890625] FS: 0000000000000000(0000) GS:ffff88849fa40000(0000) knlGS:0000000000000000 [ 2016.890628] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 2016.890629] CR2: 00007f4a1fea0000 CR3: 0000000002a09002 CR4: 0000000000f70ef0 [ 2016.890631] PKRU: 55555554 [ 2016.890632] Call Trace: [ 2016.890633] <TASK> [ 2016.890634] ? 0xffffffff810db590 [ 2016.890636] ? 0xffffffff81e7cbf1 [ 2016.890637] ? 0xffffffff81262525 [ 2016.890638] ? 0xffffffff81ea509d [ 2016.890640] ? 0xffffffff81ea5036 [ 2016.890641] ? 0xffffffff82000a56 [ 2016.890643] ? 0xffffffff81262525 [ 2016.890644] 0xffffffff810c5429 [ 2016.890646] 0xffffffff810c5a2c [ 2016.890648] 0xffffffff8125d143 [ 2016.890650] 0xffffffff8125eec1 [ 2016.890652] 0xffffffff8125f038 [ 2016.890653] 0xffffffffa0e2d90b [ 2016.890655] 0xffffffffa0d24365 [ 2016.890656] ? 0xffffffffa0e36260 [ 2016.890658] 0xffffffffa0d18683 [ 2016.890660] 0xffffffffa0e33ec9 [ 2016.890662] 0xffffffffa0e36310 [ 2016.890663] 0xffffffff811040db [ 2016.890665] ? 0xffffffff81104000 [ 2016.890666] 0xffffffff81090b7b [ 2016.890667] ? 0xffffffff81104000 [ 2016.890669] 0xffffffff81000381 [ 2016.890671] </TASK> [ 2016.890672] ---[ end trace 0000000000000000 ]---
I tried 550 and 555 disabling gsp firmware, disabling -flto and -O3 on my make.conf (I use gentoo). Nothing worked.
It is not a serious issue but it is there. I use linux-tkg but none of the patches modify rwsem.h so I don’t think that is the issue.
nvidia-bug-report.log.gz (627.1 KB)

1 Like

I think it’s a linux kernel bug and no a nvidia driver problem.

what you described is indicative of what happens when compiling firefox with -03 in my experience, that makes firefox unstable, -02 should be the safe settings in most use cases, and perhaps -03 in limited cases such as non-graphical programs, but I couldn’t advise you in that case.

Firefox automatically compiles itself with -O3. Even on binary distros liike arch you can go to about:buildconfig and see that it is compiled with -O3.

Programs become unstable in the context of the broader system they operate within, so for example I typically use high performance tweaks even for general desktop usage, and so programs can become unstable simply because of that context, and stutter, fail to load, crash, display artifacts, even weird behavior like in gaming.

The firefox I’m using now does not use -03 but -02, and in my experience, when I tried to compile it in the past, on a funtoo system with -03, because I thought hey, I want it to be faster, firefox became unstable just as the OP is describing.

I don’t think it’s appropriate for apps to default to -03 based on my experience, because they will become unstable, and unreliable, which isn’t always obvious. But, try running many tabs at the same time, multiple instances, like you might infrequently when using firefox, or cranking up the volume on the underlying systems performance, and watch firefox start to suffer.

this is really neat though, thanks for that piece of info…

about:buildconfig

I have this exact error.
mostly the same hardware too.
I only get this error when I’m running in hybrid mode or dedicated Nvidia mode.
If I’m running in AMD mode I don’t get this error.

Also got a similiar problem on an ASUS TUF GAMING A16 (FA607PV), which has R9 7940HX and RTX 4060 Laptop GPU.
The dmesg log:

[ 42.046694] ------------[ cut here ]------------
[ 42.046694] WARNING: CPU: 8 PID: 1269 at include/linux/rwsem.h:80 follow_pte+0x1de/0x200
[ 42.046696] Modules linked in: ccm snd_seq_dummy rfcomm snd_hrtimer snd_seq snd_seq_device overlay cmac algif_hash algif_skcipher af_alg bnep nvidia_drm(OE) nvidia_modeset(OE) vfat fat snd_hda_codec_realtek snd_hda_codec_generic snd_hda_scodec_component snd_hda_codec_hdmi snd_sof_amd_acp63 snd_sof_amd_vangogh amd_atl snd_sof_amd_rembrandt intel_rapl_msr snd_sof_amd_renoir intel_rapl_common joydev mousedev snd_sof_amd_acp snd_sof_pci snd_sof_xtensa_dsp snd_sof snd_sof_utils snd_pci_ps snd_amd_sdw_acpi soundwire_amd soundwire_generic_allocation soundwire_bus snd_soc_core snd_hda_intel snd_intel_dspcfg snd_compress mt7921e snd_intel_sdw_acpi ac97_bus mt7921_common snd_pcm_dmaengine uvcvideo snd_hda_codec mt792x_lib snd_rpl_pci_acp6x snd_acp_pci videobuf2_vmalloc mt76_connac_lib snd_hda_core snd_acp_legacy_common uvc mt76 btusb snd_pci_acp6x videobuf2_memops snd_hwdep kvm_amd videobuf2_v4l2 btrtl snd_pci_acp5x snd_pcm btintel snd_rn_pci_acp3x mac80211 videodev r8169 snd_timer ucsi_acpi btbcm snd_acp_config hid_multitouch
[ 42.046727] realtek kvm typec_ucsi sp5100_tco snd videobuf2_common btmtk libarc4 mdio_devres snd_soc_acpi hid_generic bluetooth mc asus_nb_wmi wmi_bmof cfg80211 rapl nvidia_wmi_ec_backlight typec pcspkr libphy soundcore snd_pci_acp3x k10temp i2c_piix4 i2c_hid_acpi roles i2c_hid amd_pmc nvidia_uvm(OE) mac_hid nvidia(OE) crypto_user loop nfnetlink ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd2 dm_crypt cbc encrypted_keys trusted asn1_encoder tee dm_mod amdgpu crct10dif_pclmul crc32_pclmul crc32c_intel polyval_clmulni amdxcp polyval_generic i2c_algo_bit gf128mul drm_ttm_helper ghash_clmulni_intel ttm sha512_ssse3 drm_exec sha256_ssse3 gpu_sched sha1_ssse3 drm_suballoc_helper aesni_intel nvme drm_buddy crypto_simd drm_display_helper nvme_core cryptd xhci_pci ccp cec xhci_pci_renesas nvme_auth serio_raw atkbd libps2 vivaldi_fmap hid_asus asus_wmi i8042 platform_profile usbhid serio asus_wmi_sensors asus_wireless sparse_keymap rfkill video wmi
[ 42.046766] CPU: 8 PID: 1269 Comm: nv_queue Tainted: G W OE 6.10.3-arch1-2 #1 20bffa7dc84b9a89fd543afbd712f49dca71b693
[ 42.046767] Hardware name: ASUSTeK COMPUTER INC. ASUS TUF Gaming A16 FA607PV_FA607PV/FA607PV, BIOS FA607PV.307 03/27/2024
[ 42.046768] RIP: 0010:follow_pte+0x1de/0x200
[ 42.046769] Code: 14 da 00 48 81 e2 00 00 00 c0 48 09 c2 48 f7 d2 48 85 fa 75 20 e8 b2 f5 ff ff 48 8b 35 6b e3 5c 01 48 81 e6 00 00 00 c0 eb 8d <0f> 0b 48 3b 1f 0f 83 50 fe ff ff bd ea ff ff ff eb b6 49 8b 3c 24
[ 42.046770] RSP: 0018:ffffb53881bc7b48 EFLAGS: 00010246
[ 42.046771] RAX: 0000000000000000 RBX: 00007e4869430000 RCX: ffffb53881bc7b88
[ 42.046772] RDX: ffffb53881bc7b80 RSI: 00007e4869430000 RDI: ffffa0571dae0da8
[ 42.046773] RBP: ffffb53881bc7bc8 R08: ffffb53881bc7d20 R09: 0000000000000000
[ 42.046774] R10: 0000000000000200 R11: 0000000000000000 R12: ffffb53881bc7b88
[ 42.046774] R13: ffffb53881bc7b80 R14: ffffa056c006c200 R15: 0000000000000000
[ 42.046775] FS: 0000000000000000(0000) GS:ffffa0661d200000(0000) knlGS:0000000000000000
[ 42.046776] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 42.046777] CR2: 0000725e8f854bb0 CR3: 00000007f4020000 CR4: 0000000000f50ef0
[ 42.046778] PKRU: 55555554
[ 42.046778] Call Trace:
[ 42.046779]
[ 42.046779] ? follow_pte+0x1de/0x200
[ 42.046781] ? __warn.cold+0x8e/0xe8
[ 42.046782] ? follow_pte+0x1de/0x200
[ 42.046784] ? report_bug+0xff/0x140
[ 42.046786] ? handle_bug+0x3c/0x80
[ 42.046787] ? exc_invalid_op+0x17/0x70
[ 42.046788] ? asm_exc_invalid_op+0x1a/0x20
[ 42.046791] ? follow_pte+0x1de/0x200
[ 42.046793] follow_phys+0x49/0x110
[ 42.046796] untrack_pfn+0x55/0x120
[ 42.046797] unmap_single_vma+0xa6/0xe0
[ 42.046800] zap_page_range_single+0x122/0x1d0
[ 42.046804] unmap_mapping_range+0x116/0x140
[ 42.046806] ? __pfx__main_loop+0x10/0x10 [nvidia ba802dc980f2ebec319fb090d3412a969b6d9416]
[ 42.046855] nv_revoke_gpu_mappings+0x67/0xb0 [nvidia ba802dc980f2ebec319fb090d3412a969b6d9416]
[ 42.046899] RmHandleIdleSustained+0x3b/0x140 [nvidia ba802dc980f2ebec319fb090d3412a969b6d9416]
[ 42.046961] ? gpumgrGetGpu+0x69/0xa0 [nvidia ba802dc980f2ebec319fb090d3412a969b6d9416]
[ 42.047041] rm_execute_work_item+0xda/0x150 [nvidia ba802dc980f2ebec319fb090d3412a969b6d9416]
[ 42.047111] _main_loop+0x95/0x150 [nvidia ba802dc980f2ebec319fb090d3412a969b6d9416]
[ 42.047161] kthread+0xcf/0x100
[ 42.047163] ? __pfx_kthread+0x10/0x10
[ 42.047165] ret_from_fork+0x31/0x50
[ 42.047166] ? __pfx_kthread+0x10/0x10
[ 42.047168] ret_from_fork_asm+0x1a/0x30
[ 42.047171]
[ 42.047172] —[ end trace 0000000000000000 ]—

The nvidia bug report is at here:
nvidia-bug-report.log.gz (923.9 KB)

I’ve tried linux-lts in archlinux, and it seems that linux 6.6.45-1-lts works well with nvidia 560 beta and I experienced no call traces in the dmesg
Maybe the driver doesn’t have full support for newer versions of linux kernel?

We have seen similar call trace internally and currently investigating the issue.
Shall update once there is further feedback from engineering team.

I’m using Ubuntu 24.04 LTS that comes with kernel 6.8 with driver 550 and everything works. I don’t know if this information is useful since the original 6.8 kernel series is outdated. If I change to a more recent 6.10 kernel I got the crashes.

I updated to Ubuntu Oracular these days (not a supported release, though) and I still need to use the kernel from Ubuntu 24.04 LTS, 6.8.0, for this problem not to happen. The Ubuntu Oracular, when released next month, will come with kernel 6.11.

Also having this same issue with recent 6.10 kernel and Arch Linux

[  377.061048] CPU: 6 PID: 4449 Comm: nv_queue Tainted: P        W  OE      6.10.10-arch1-1 #1 e28ee6293423e91d57555c4cc06eb839714254b7
[  377.061050] Hardware name: ASUSTeK COMPUTER INC. ASUS TUF Dash F15 FX517ZR_FX517ZR/FX517ZR, BIOS FX517ZR.317 05/03/2023
[  377.061051] RIP: 0010:follow_pte+0x1de/0x200
[  377.061053] Code: cc cc cc 48 81 e2 00 00 00 c0 48 09 c2 48 f7 d2 48 85 fa 75 20 e8 b2 f5 ff ff 48 8b 35 6b dd 5c 01 48 81 e6 00 00 00 c0 eb 8d <0f> 0b 48 3b 1f 0f 83 50 fe ff ff bd ea ff ff ff eb b6 49 8b 3c 24
[  377.061054] RSP: 0018:ffffb6670138fb48 EFLAGS: 00010246
[  377.061056] RAX: 0000000000000000 RBX: 00007feefdb2e000 RCX: ffffb6670138fb88
[  377.061056] RDX: ffffb6670138fb80 RSI: 00007feefdb2e000 RDI: ffff9b1779da9ee8
[  377.061057] RBP: ffffb6670138fbc8 R08: ffffb6670138fd20 R09: 0000000000000000
[  377.061058] R10: 0000000000000001 R11: 0000000000000003 R12: ffffb6670138fb88
[  377.061058] R13: ffffb6670138fb80 R14: ffff9b16c1572680 R15: 0000000000000000
[  377.061059] FS:  0000000000000000(0000) GS:ffff9b1a70100000(0000) knlGS:0000000000000000
[  377.061060] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  377.061060] CR2: 00007feeb3d8e000 CR3: 00000004aec20000 CR4: 0000000000f50ef0
[  377.061061] PKRU: 55555554
[  377.061062] Call Trace:
[  377.061063]  <TASK>
[  377.061064]  ? follow_pte+0x1de/0x200
[  377.061065]  ? __warn.cold+0x8e/0xe8
[  377.061067]  ? follow_pte+0x1de/0x200
[  377.061070]  ? report_bug+0xff/0x140
[  377.061072]  ? handle_bug+0x3c/0x80
[  377.061074]  ? exc_invalid_op+0x17/0x70
[  377.061075]  ? asm_exc_invalid_op+0x1a/0x20
[  377.061077]  ? follow_pte+0x1de/0x200
[  377.061079]  follow_phys+0x49/0x110
[  377.061081]  untrack_pfn+0x55/0x120
[  377.061082]  unmap_single_vma+0xa6/0xe0
[  377.061085]  zap_page_range_single+0x122/0x1d0
[  377.061087]  unmap_mapping_range+0x116/0x140
[  377.061089]  ? __pfx__main_loop+0x10/0x10 [nvidia b1241af8faba3de1395102ed1cb7464de6059a3f]
[  377.061187]  nv_revoke_gpu_mappings+0x67/0xb0 [nvidia b1241af8faba3de1395102ed1cb7464de6059a3f]
[  377.061239]  RmHandleIdleSustained+0x3b/0x140 [nvidia b1241af8faba3de1395102ed1cb7464de6059a3f]
[  377.061343]  ? gpumgrGetGpu+0x69/0xa0 [nvidia b1241af8faba3de1395102ed1cb7464de6059a3f]
[  377.061453]  rm_execute_work_item+0xda/0x150 [nvidia b1241af8faba3de1395102ed1cb7464de6059a3f]
[  377.061559]  _main_loop+0x95/0x150 [nvidia b1241af8faba3de1395102ed1cb7464de6059a3f]
[  377.061618]  kthread+0xcf/0x100
[  377.061620]  ? __pfx_kthread+0x10/0x10
[  377.061621]  ret_from_fork+0x31/0x50
[  377.061623]  ? __pfx_kthread+0x10/0x10
[  377.061624]  ret_from_fork_asm+0x1a/0x30
[  377.061627]  </TASK>
[  377.061627] ---[ end trace 0000000000000000 ]---

Nvidia drivers have been decreasing quality lately… that’s just sad.

And this crash happens repeatedly over the day, sometimes at the same second timestamp. In my laptop that has barely been booted, launched i3wm+firefox.

[nwildner@arrakis ~]$ sudo dmesg | grep  nv_revoke_gpu_mappings 
[sudo] password for nwildner: 
[   56.648146]  nv_revoke_gpu_mappings+0x67/0xb0 [nvidia b1241af8faba3de1395102ed1cb7464de6059a3f]
[   56.648755]  nv_revoke_gpu_mappings+0x67/0xb0 [nvidia b1241af8faba3de1395102ed1cb7464de6059a3f]
[   56.649347]  nv_revoke_gpu_mappings+0x67/0xb0 [nvidia b1241af8faba3de1395102ed1cb7464de6059a3f]
[   56.649908]  nv_revoke_gpu_mappings+0x67/0xb0 [nvidia b1241af8faba3de1395102ed1cb7464de6059a3f]
[   56.650469]  nv_revoke_gpu_mappings+0x67/0xb0 [nvidia b1241af8faba3de1395102ed1cb7464de6059a3f]
[   56.651030]  nv_revoke_gpu_mappings+0x67/0xb0 [nvidia b1241af8faba3de1395102ed1cb7464de6059a3f]
[   56.651590]  nv_revoke_gpu_mappings+0x67/0xb0 [nvidia b1241af8faba3de1395102ed1cb7464de6059a3f]
[   56.652147]  nv_revoke_gpu_mappings+0x67/0xb0 [nvidia b1241af8faba3de1395102ed1cb7464de6059a3f]
[   56.652715]  nv_revoke_gpu_mappings+0x67/0xb0 [nvidia b1241af8faba3de1395102ed1cb7464de6059a3f]
[   56.653276]  nv_revoke_gpu_mappings+0x67/0xb0 [nvidia b1241af8faba3de1395102ed1cb7464de6059a3f]
[   56.653833]  nv_revoke_gpu_mappings+0x67/0xb0 [nvidia b1241af8faba3de1395102ed1cb7464de6059a3f]
[   56.654391]  nv_revoke_gpu_mappings+0x67/0xb0 [nvidia b1241af8faba3de1395102ed1cb7464de6059a3f]
[   56.654968]  nv_revoke_gpu_mappings+0x67/0xb0 [nvidia b1241af8faba3de1395102ed1cb7464de6059a3f]

Any update on this? Still seeing these traces.
System Info:

  Host: fedora Kernel: 6.11.7-300.fc41.x86_64 arch: x86_64 bits: 64
  Console: pty pts/1 Distro: Fedora Linux 41 (Workstation Edition)
Machine:
  Type: Laptop System: Razer product: Blade 15 (2022) - RZ09-0421 v: 8.04
    serial: BY2222M73501760
  Mobo: Razer model: CH580 v: 4 serial: N/A UEFI: Razer v: 2.06
    date: 11/01/2023
Graphics:
  Device-1: Intel Alder Lake-P GT2 [Iris Xe Graphics] driver: i915 v: kernel
  Device-2: NVIDIA GA106M [GeForce RTX 3060 Mobile / Max-Q] driver: nvidia
    v: 565.57.01
  Device-3: IMC Networks Integrated RGB Camera driver: uvcvideo type: USB
  Display: unspecified server: X.Org v: 24.1.4 with: Xwayland v: 24.1.4
    driver: X: loaded: intel dri: iris gpu: i915 resolution: 2560x1440~240Hz
  API: OpenGL v: 4.6 vendor: intel mesa v: 24.2.6 renderer: Mesa Intel
    Graphics (ADL GT2)
  API: EGL Message: EGL data requires eglinfo. Check --recommends.

Traces:

[   41.983366] ------------[ cut here ]------------
[   41.983370] WARNING: CPU: 5 PID: 1884 at include/linux/rwsem.h:80 follow_pte+0x1f0/0x220
[   41.983378] Modules linked in: uinput snd_seq_dummy snd_hrtimer rfcomm nf_conntrack_netbios_ns nf_conntrack_broadcast nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nvidia_drm(PO) nvidia_modeset(PO) nvidia_uvm(PO) ip_set uhid nf_tables sunrpc nvidia(PO) qrtr bnep snd_ctl_led snd_soc_skl_hda_dsp snd_soc_hdac_hdmi snd_sof_probes snd_soc_intel_hda_dsp_common binfmt_misc vfat fat snd_hda_codec_realtek snd_hda_codec_generic snd_hda_scodec_component snd_soc_dmic snd_sof_pci_intel_tgl snd_sof_pci_intel_cnl snd_sof_intel_hda_generic soundwire_intel soundwire_cadence squashfs snd_sof_intel_hda_common snd_sof_intel_hda_mlink snd_sof_intel_hda snd_sof_pci snd_sof_xtensa_dsp snd_sof snd_sof_utils snd_soc_hdac_hda iwlmvm snd_soc_acpi_intel_match soundwire_generic_allocation snd_soc_acpi soundwire_bus snd_soc_avs snd_soc_hda_codec intel_uncore_frequency snd_hda_ext_core intel_uncore_frequency_common
[   41.983423]  mac80211 snd_soc_core x86_pkg_temp_thermal intel_powerclamp coretemp snd_compress snd_hda_codec_hdmi ac97_bus snd_pcm_dmaengine kvm_intel snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi libarc4 snd_hda_codec kvm btusb snd_hda_core uvcvideo snd_hwdep uvc btrtl videobuf2_vmalloc iwlwifi snd_seq videobuf2_memops btintel videobuf2_v4l2 rapl btbcm spi_nor snd_seq_device videobuf2_common btmtk iTCO_wdt processor_thermal_device_pci intel_cstate intel_pmc_bxt mei_hdcp mei_pxp mtd spd5118 iTCO_vendor_support intel_rapl_msr videodev snd_pcm processor_thermal_device bluetooth thunderbolt cfg80211 intel_uncore snd_timer processor_thermal_wt_hint mc wmi_bmof mei_me processor_thermal_rfim pcspkr i2c_i801 processor_thermal_rapl spi_intel_pci snd spi_intel i2c_smbus mei intel_rapl_common soundcore rfkill idma64 processor_thermal_wt_req igen6_edac processor_thermal_power_floor processor_thermal_mbox intel_pmc_core joydev int3403_thermal int340x_thermal_zone intel_vsec intel_hid pmt_telemetry int3400_thermal
[   41.983473]  sparse_keymap pmt_class acpi_thermal_rel acpi_pad acpi_tad loop nfnetlink zram dm_crypt xe drm_ttm_helper gpu_sched drm_suballoc_helper drm_gpuvm drm_exec i915 crct10dif_pclmul crc32_pclmul i2c_algo_bit nvme crc32c_intel drm_buddy polyval_clmulni ttm polyval_generic sdhci_pci drm_display_helper nvme_core cqhci ghash_clmulni_intel sdhci hid_multitouch sha512_ssse3 nvidia_wmi_ec_backlight mmc_core sha256_ssse3 sha1_ssse3 cec nvme_auth i2c_hid_acpi i2c_hid video wmi pinctrl_tigerlake ip6_tables ip_tables fuse
[   41.983502] CPU: 5 UID: 0 PID: 1884 Comm: nv_queue Tainted: P           O       6.11.7-300.fc41.x86_64 #1
[   41.983507] Tainted: [P]=PROPRIETARY_MODULE, [O]=OOT_MODULE
[   41.983508] Hardware name: Razer Blade 15 (2022) - RZ09-0421/CH580, BIOS 2.06 11/01/2023
[   41.983509] RIP: 0010:follow_pte+0x1f0/0x220
[   41.983513] Code: cc cc cc 48 81 e2 00 00 00 c0 48 09 c2 48 f7 d2 48 85 fa 75 20 e8 e0 f0 ff ff 48 8b 35 39 32 90 01 48 81 e6 00 00 00 c0 eb 89 <0f> 0b 48 3b 1f 0f 83 42 fe ff ff bd ea ff ff ff eb b2 49 8b 3c 24
[   41.983515] RSP: 0018:ffffbad3c9663b68 EFLAGS: 00010246
[   41.983517] RAX: 0000000000000000 RBX: 00007f79e809a000 RCX: ffffbad3c9663bb0
[   41.983519] RDX: ffffbad3c9663ba8 RSI: 00007f79e809a000 RDI: ffff9e11b2bd1ad0
[   41.983520] RBP: ffffbad3c9663bf0 R08: ffffbad3c9663d48 R09: 0000000000000000
[   41.983521] R10: ffff9e1221b55f2c R11: ffff9e11a99c8008 R12: ffffbad3c9663bb0
[   41.983522] R13: ffffbad3c9663ba8 R14: ffff9e11800af380 R15: 0000000000000000
[   41.983524] FS:  0000000000000000(0000) GS:ffff9e151d480000(0000) knlGS:0000000000000000
[   41.983525] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   41.983526] CR2: 00007f63b9f09c00 CR3: 000000001f42a000 CR4: 0000000000f50ef0
[   41.983528] PKRU: 55555554
[   41.983529] Call Trace:
[   41.983531]  <TASK>
[   41.983533]  ? follow_pte+0x1f0/0x220
[   41.983536]  ? __warn.cold+0x8e/0xe8
[   41.983539]  ? follow_pte+0x1f0/0x220
[   41.983545]  ? report_bug+0xff/0x140
[   41.983549]  ? handle_bug+0x58/0x90
[   41.983551]  ? exc_invalid_op+0x17/0x70
[   41.983552]  ? asm_exc_invalid_op+0x1a/0x20
[   41.983555]  ? follow_pte+0x1f0/0x220
[   41.983558]  follow_phys+0x49/0x110
[   41.983561]  untrack_pfn+0x55/0x120
[   41.983564]  unmap_single_vma+0xa6/0xe0
[   41.983568]  zap_page_range_single+0x122/0x1d0
[   41.983573]  unmap_mapping_range+0x116/0x140
[   41.983579]  nv_revoke_gpu_mappings+0x67/0xb0 [nvidia]
[   41.984017]  _nv000741rm+0x35/0xf6 [nvidia]
[   41.984446]  rm_execute_work_item+0x13e/0x1f0 [nvidia]
[   41.984870]  os_execute_work_item+0x5e/0x80 [nvidia]
[   41.985129]  _main_loop+0x8f/0x150 [nvidia]
[   41.985410]  ? __pfx__main_loop+0x10/0x10 [nvidia]
[   41.985670]  kthread+0xcf/0x100
[   41.985674]  ? __pfx_kthread+0x10/0x10
[   41.985676]  ret_from_fork+0x31/0x50
[   41.985679]  ? __pfx_kthread+0x10/0x10
[   41.985680]  ret_from_fork_asm+0x1a/0x30
[   41.985685]  </TASK>
[   41.985686] ---[ end trace 0000000000000000 ]---
[   41.985709] ------------[ cut here ]------------

Should be resolved/obsoleted by API changes in kernel 6.12.

See:

-and-

The stack traces in this thread occurred during suspend, with kernels 6.10.x and 6.11.x.

Known issue as mentioned above: Nvidia driver kernel random call trace - #14 by Tekstryder

I can confirm this issue is indeed resolved with kernel 6.12.

  • Arch Linux | Kernel 6.12.1
  • nVidia 565.57.01

@polq1200, please mark this thread as solved if you can no longer reproduce.

Or perhaps @amrits can close this.