… may be related to Use-after-free, but no suspend/resume involved here.
Nvidia: Driver Version: 555.58.02 CUDA Version: 12.5 NVIDIA GeForce RTX 4090
Fedora 40 (kernel: 6.10.5-200.fc40.x86_64 and most recent previous kernel releases).
Getting BUG: KFENCE: use-after-free read in nv_dma_release_sgt+0x29/0x70 [nvidia]
and then, since recently, crashes/reboots system abruptly.
We have lose work every time this happens, but everything else seems OK on our machines lately, thanks NVIDIA!
It would be great, NVIDIA, if you could quickly make 560.31.02available via some repo, since I have seen mentioned that you may have already successfully addressed this issue!
Happy to provide more info if it can help … but be specific (include commands) as I am definitely not an expert in this area.
… thanks @robin.dusky - I already decided to drop the nvidia drivers as it is just not worth the loss of time, the costs or trouble for our immediate use-cases (not games or AI).
Since I have an external monitor attached (I would have thought the hdmi in is on the nvidia card) and
I suppose I am now using nouveau. Chrome takes half a day to startup, but apart from that it is much better than the system constantly rebooting itself.
Hi @eleonorac2
Thanks for reporting issue to us, could you please share nvidia bug report from repro state and reliable repro steps to duplicate issue.
Kernel: 6.10.6-200.fc40.x86_64
Driver Version: 560.35.03 (just installed, since it just became available, thanks!)
Aug 29 13:17:13 fedora kernel: BUG: KFENCE: use-after-free read in nv_dma_release_sgt+0x29/0x70 [nvidia]
Aug 29 13:17:13 fedora kernel: Use-after-free read at 0x00000000525ee5ac (in kfence-#9):
Aug 29 13:17:13 fedora kernel: nv_dma_release_sgt+0x29/0x70 [nvidia]
Aug 29 13:17:13 fedora kernel: _nv043722rm+0x67/0xd0 [nvidia]
Aug 29 13:17:13 fedora kernel: _nv038493rm+0xc7/0x430 [nvidia]
Aug 29 13:17:13 fedora kernel: _nv038458rm+0x6b/0x130 [nvidia]
Aug 29 13:17:13 fedora kernel: _nv004292rm+0xd/0x20 [nvidia]
Aug 29 13:17:13 fedora kernel: _nv006122rm+0x1b/0xb0 [nvidia]
Aug 29 13:17:13 fedora kernel: _nv018080rm+0x59c/0x680 [nvidia]
Aug 29 13:17:13 fedora kernel: _nv048181rm+0xb3/0xe0 [nvidia]
Aug 29 13:17:13 fedora kernel: _nv049956rm+0xb0/0x180 [nvidia]
Aug 29 13:17:13 fedora kernel: _nv049955rm+0x50b/0x660 [nvidia]
Aug 29 13:17:13 fedora kernel: _nv048073rm+0xdd/0x190 [nvidia]
Aug 29 13:17:13 fedora kernel: _nv048074rm+0x41/0x70 [nvidia]
Aug 29 13:17:13 fedora kernel: _nv000587rm+0x4a/0x60 [nvidia]
Aug 29 13:17:13 fedora kernel: _nv000745rm+0x20f/0xe00 [nvidia]
Aug 29 13:17:13 fedora kernel: rm_ioctl+0x7f/0x400 [nvidia]
Aug 29 13:17:13 fedora kernel: nvidia_unlocked_ioctl+0x53b/0x8d0 [nvidia]
Aug 29 13:17:13 fedora kernel: __x64_sys_ioctl+0x94/0xd0
Aug 29 13:17:13 fedora kernel: do_syscall_64+0x82/0x160
Aug 29 13:17:13 fedora kernel: entry_SYSCALL_64_after_hwframe+0x76/0x7e
Aug 29 13:17:13 fedora kernel:
Aug 29 13:17:13 fedora kernel: kfence-#9: 0x0000000025c59c30-0x000000005a96edca, size=384, cache=kmalloc-512
Aug 29 13:17:13 fedora kernel: allocated by task 3668 on cpu 30 at 1668.609729s:
Aug 29 13:17:13 fedora kernel: nv_drm_gem_prime_import_sg_table+0x2d/0xb0 [nvidia_drm]
Aug 29 13:17:13 fedora kernel: drm_gem_prime_import_dev+0x93/0x180
Aug 29 13:17:13 fedora kernel: drm_gem_prime_fd_to_handle+0xe7/0x220
Aug 29 13:17:13 fedora kernel: drm_ioctl_kernel+0xb0/0x100
Aug 29 13:17:13 fedora kernel: drm_ioctl+0x28b/0x540
Aug 29 13:17:13 fedora kernel: __x64_sys_ioctl+0x94/0xd0
Aug 29 13:17:13 fedora kernel: do_syscall_64+0x82/0x160
Aug 29 13:17:13 fedora kernel: entry_SYSCALL_64_after_hwframe+0x76/0x7e
Aug 29 13:17:13 fedora kernel:
Aug 29 13:17:13 fedora kernel: freed by task 3668 on cpu 15 at 1668.704835s:
Aug 29 13:17:13 fedora kernel: nv_dma_release_sgt+0x49/0x70 [nvidia]
Aug 29 13:17:13 fedora kernel: _nv043722rm+0x67/0xd0 [nvidia]
Aug 29 13:17:13 fedora kernel: _nv038493rm+0xc7/0x430 [nvidia]
Aug 29 13:17:13 fedora kernel: _nv038458rm+0x6b/0x130 [nvidia]
Aug 29 13:17:13 fedora kernel: _nv004292rm+0xd/0x20 [nvidia]
Aug 29 13:17:13 fedora kernel: _nv006122rm+0x1b/0xb0 [nvidia]
Aug 29 13:17:13 fedora kernel: _nv018080rm+0x59c/0x680 [nvidia]
Aug 29 13:17:13 fedora kernel: _nv048181rm+0xb3/0xe0 [nvidia]
Aug 29 13:17:13 fedora kernel: _nv049956rm+0xb0/0x180 [nvidia]
Aug 29 13:17:13 fedora kernel: _nv049955rm+0x50b/0x660 [nvidia]
Aug 29 13:17:13 fedora kernel: _nv048073rm+0xdd/0x190 [nvidia]
Aug 29 13:17:13 fedora kernel: _nv048074rm+0x41/0x70 [nvidia]
Aug 29 13:17:13 fedora kernel: _nv000587rm+0x4a/0x60 [nvidia]
Aug 29 13:17:13 fedora kernel: _nv000745rm+0x20f/0xe00 [nvidia]
Aug 29 13:17:13 fedora kernel: rm_ioctl+0x7f/0x400 [nvidia]
Aug 29 13:17:13 fedora kernel: nvidia_unlocked_ioctl+0x53b/0x8d0 [nvidia]
Aug 29 13:17:13 fedora kernel: __x64_sys_ioctl+0x94/0xd0
Aug 29 13:17:13 fedora kernel: do_syscall_64+0x82/0x160
Aug 29 13:17:13 fedora kernel: entry_SYSCALL_64_after_hwframe+0x76/0x7e
Aug 29 13:17:13 fedora kernel:
Aug 29 13:17:13 fedora kernel: CPU: 15 PID: 3668 Comm: gnome-shell Tainted: P B W OE 6.10.6-200.fc40.x86_64 #1
Please give specific steps/commands if I can provide more info. Also, hopefully others that know what they are doing/have experience of helping NVIDIA with bugs/reports can chime in!
Hi eleonorac2,
My laptop with 3070Ti used to reboot randomly. I’ve disabled PCIe Link State Power Management. While I’m still getting the same messages as you, there are no reboots for a few days.
I’ve used the following guidance for my laptop
Edit the /etc/UPower/UPower.conf file. Add the following line to the [PCIeLinkStatePowerManagement] section:
Enabled=false
Then, restart the UPower service to apply the changes:
sudo systemctl restart upower
Forum thread with the details about this error on Windows:
@Yury_RX16 - that is very thoughtful, thank you!! E486: Pattern not found: PC in my /etc/UPower/UPower.conf, but I can experiment with adding a [PCIeLinkStatePowerManagement] section and setting Enabled=false, as this may also (very naively optimistic probably) help with the external monitor blackouts, but I first want to give it a day or 2 to see how things are with 560.35.03 / 6.10.7-200.fc40.x86_64, as we have not yet crashed today.
… OK, I will report back once I make that edit (and let the system run for a bit under usual conditions) as I am indeed also on Fedora 40, and will probably move to 41 as soon as possible next month(?).
use-after-free bugs still appearing but crash/reboot frequency is down since 6.10.6-200.fc40.x86_64/560.35.03 (now 6.10.9-200.fc40.x86_64), and external monitor blackouts still happening.
Tried the suggested
[PCIeLinkStatePowerManagement]
Enabled=false
for a couple of days, which does not seem to have affected any of the above, but , does not seem to have done any harm elsewhere, either. I will probably keep UPower.conf as is for now.
I had no idea keyboard lighting was possibly related to nvidia!
When I have problems with keyboard not lighting up, and the function key to control it not doing anything, you probably know this, but I find that the gnome quick-menu (I think - the thing that pops up when you click the speaker/battery/network etc symbols) has a keyboard lighting control, and that works, and somehow kicks the function key back into action.
ah disregard, my keyboard is not controlled by nvidia_wmi_ec_backlight, but I did notice the keylights turning off.
hopefully NVIDIA can drill down on this kfence error