Use-after-free on GTX 1650 dGPU with 545.29.06 on Fedora 39 + Wayland

… to gather more info for NVIDIA, because this is such a serious problem (especially now that the system crashes/reboots so often), I set off a journalctl --boot --lines=all --follow and I can now confirm that

BUG: KFENCE: use-after-free read in nv_dma_release_sgt+0x29/0x70 [nvidia]

is sometimes happening much more frequently than I thought and there is no special cause for it in terms of what the user is doing - just normal use (could be dragging a window or playing some youtube or opening a new app or not doing anything at all!), and, importantly, I discovered that quite often this coincides with the external monitor (I even bought a new monitor and HDMI cable this week in the hope that we, especially NVIDIA, can get past this!!) blacking out for a few seconds.
This blacking out for a few seconds is something we have been living with for a long time, but only started paying more attention to it once the crashes/reboots started recently, which is obviously not acceptable for any use case and very costly.
I hope NVIDIA can assign resources to this because it is not good look to let this stuff fester for such top of the range hardware, which is rendered useless by the shoddy drivers, probably mainly due to lack of taking testing seriously (despite being made aware of issues by many helpful users on these forums, for example).

1 Like

… hopefully this comment is not premature, but with
560.35.03 / 6.10.7-200.fc40.x86_64
the crashing/rebooting seems to be much less frequent, even though there are still plenty of
BUG: KFENCE: use-after-free read in nv_dma_release_sgt+0x29/0x70 [nvidia]s
reported and external monitor blackouts.

1 Like

After updating from 555 to 560.35.03 (6.10.9-200.fc40.x86_64), the situation has gotten worse for me. KFENCE errors w/ lockup 2-3 times per day.

It still happens regularly for me with 560.35.03, luckily with no lockups. I still haven’t been able to pinpoint a specific action that triggers it, but in my case at least, it seems gnome-shell is always the culprit:

...
set 20 08:42:26 fedoracosta kernel: CPU: 3 PID: 31578 Comm: gnome-shell Tainted: P    B   W  OE      6.10.10-200.fc40.x86_64 #1
set 20 08:42:26 fedoracosta kernel: Hardware name: Acer Nitro AN515-44/Stonic_RNS, BIOS V1.04 02/04/2021
set 20 08:42:26 fedoracosta kernel: ==================================================================
❯ journalctl --since=yesterday | grep -c use-after-free
55

❯ journalctl --since=yesterday | grep -c "gnome-shell Tainted"
55
1 Like

I cannot change the title of this thread, but just for the record, it still happens on 565.57.01 on Fedora 41 with kernel 6.12:

❯ journalctl --since=yesterday | grep -c use-after-free
13

❯ journalctl --since=yesterday | grep -c "gnome-shell Tainted"
13

❯ modinfo -F version nvidia
565.57.01

❯ uname -r
6.12.0-65.fc42.x86_64