BUG: KFENCE: use-after-free read in nv_dma_release_sgt+0x29/0x70 [nvidia]

… may be related to Use-after-free, but no suspend/resume involved here.

Nvidia: Driver Version: 555.58.02 CUDA Version: 12.5 NVIDIA GeForce RTX 4090
Fedora 40 (kernel: 6.10.5-200.fc40.x86_64 and most recent previous kernel releases).

Getting BUG: KFENCE: use-after-free read in nv_dma_release_sgt+0x29/0x70 [nvidia]
and then, since recently, crashes/reboots system abruptly.

We have lose work every time this happens, but everything else seems OK on our machines lately, thanks NVIDIA!

It would be great, NVIDIA, if you could quickly make 560.31.02available via some repo, since I have seen mentioned that you may have already successfully addressed this issue!

Happy to provide more info if it can help … but be specific (include commands) as I am definitely not an expert in this area.

1 Like

… same “BUG”/crashes/reboot with 6.10.6-200.fc40.x86_64.

1 Like

Hi I also have the same issue with driver: nvidia v: 560.35.03 and linux kernel 6.10.6 this happens in Wayland but not in x11

1 Like

… thanks @robin.dusky - I already decided to drop the nvidia drivers as it is just not worth the loss of time, the costs or trouble for our immediate use-cases (not games or AI).
Since I have an external monitor attached (I would have thought the hdmi in is on the nvidia card) and

nouveau              3923968  5
drm_gpuvm              45056  1 nouveau
mxm_wmi                12288  1 nouveau
i2c_algo_bit           20480  2 amdgpu,nouveau
drm_ttm_helper         12288  2 amdgpu,nouveau
ttm                   114688  3 amdgpu,drm_ttm_helper,nouveau
drm_exec               12288  3 drm_gpuvm,amdgpu,nouveau
gpu_sched              65536  2 amdgpu,nouveau
drm_display_helper    278528  2 amdgpu,nouveau
video                  81920  4 asus_wmi,amdgpu,asus_nb_wmi,nouveau
wmi                    32768  5 video,asus_wmi,wmi_bmof,mxm_wmi,nouveau

I suppose I am now using nouveau. Chrome takes half a day to startup, but apart from that it is much better than the system constantly rebooting itself.

1 Like

Hi @eleonorac2
Thanks for reporting issue to us, could you please share nvidia bug report from repro state and reliable repro steps to duplicate issue.

Kernel: 6.10.6-200.fc40.x86_64
Driver Version: 560.35.03 (just installed, since it just became available, thanks!)

Aug 29 13:17:13 fedora kernel: BUG: KFENCE: use-after-free read in nv_dma_release_sgt+0x29/0x70 [nvidia]
Aug 29 13:17:13 fedora kernel: Use-after-free read at 0x00000000525ee5ac (in kfence-#9):
Aug 29 13:17:13 fedora kernel:  nv_dma_release_sgt+0x29/0x70 [nvidia]
Aug 29 13:17:13 fedora kernel:  _nv043722rm+0x67/0xd0 [nvidia]
Aug 29 13:17:13 fedora kernel:  _nv038493rm+0xc7/0x430 [nvidia]
Aug 29 13:17:13 fedora kernel:  _nv038458rm+0x6b/0x130 [nvidia]
Aug 29 13:17:13 fedora kernel:  _nv004292rm+0xd/0x20 [nvidia]
Aug 29 13:17:13 fedora kernel:  _nv006122rm+0x1b/0xb0 [nvidia]
Aug 29 13:17:13 fedora kernel:  _nv018080rm+0x59c/0x680 [nvidia]
Aug 29 13:17:13 fedora kernel:  _nv048181rm+0xb3/0xe0 [nvidia]
Aug 29 13:17:13 fedora kernel:  _nv049956rm+0xb0/0x180 [nvidia]
Aug 29 13:17:13 fedora kernel:  _nv049955rm+0x50b/0x660 [nvidia]
Aug 29 13:17:13 fedora kernel:  _nv048073rm+0xdd/0x190 [nvidia]
Aug 29 13:17:13 fedora kernel:  _nv048074rm+0x41/0x70 [nvidia]
Aug 29 13:17:13 fedora kernel:  _nv000587rm+0x4a/0x60 [nvidia]
Aug 29 13:17:13 fedora kernel:  _nv000745rm+0x20f/0xe00 [nvidia]
Aug 29 13:17:13 fedora kernel:  rm_ioctl+0x7f/0x400 [nvidia]
Aug 29 13:17:13 fedora kernel:  nvidia_unlocked_ioctl+0x53b/0x8d0 [nvidia]
Aug 29 13:17:13 fedora kernel:  __x64_sys_ioctl+0x94/0xd0
Aug 29 13:17:13 fedora kernel:  do_syscall_64+0x82/0x160
Aug 29 13:17:13 fedora kernel:  entry_SYSCALL_64_after_hwframe+0x76/0x7e
Aug 29 13:17:13 fedora kernel: 
Aug 29 13:17:13 fedora kernel: kfence-#9: 0x0000000025c59c30-0x000000005a96edca, size=384, cache=kmalloc-512
Aug 29 13:17:13 fedora kernel: allocated by task 3668 on cpu 30 at 1668.609729s:
Aug 29 13:17:13 fedora kernel:  nv_drm_gem_prime_import_sg_table+0x2d/0xb0 [nvidia_drm]
Aug 29 13:17:13 fedora kernel:  drm_gem_prime_import_dev+0x93/0x180
Aug 29 13:17:13 fedora kernel:  drm_gem_prime_fd_to_handle+0xe7/0x220
Aug 29 13:17:13 fedora kernel:  drm_ioctl_kernel+0xb0/0x100
Aug 29 13:17:13 fedora kernel:  drm_ioctl+0x28b/0x540
Aug 29 13:17:13 fedora kernel:  __x64_sys_ioctl+0x94/0xd0
Aug 29 13:17:13 fedora kernel:  do_syscall_64+0x82/0x160
Aug 29 13:17:13 fedora kernel:  entry_SYSCALL_64_after_hwframe+0x76/0x7e
Aug 29 13:17:13 fedora kernel: 
Aug 29 13:17:13 fedora kernel: freed by task 3668 on cpu 15 at 1668.704835s:
Aug 29 13:17:13 fedora kernel:  nv_dma_release_sgt+0x49/0x70 [nvidia]
Aug 29 13:17:13 fedora kernel:  _nv043722rm+0x67/0xd0 [nvidia]
Aug 29 13:17:13 fedora kernel:  _nv038493rm+0xc7/0x430 [nvidia]
Aug 29 13:17:13 fedora kernel:  _nv038458rm+0x6b/0x130 [nvidia]
Aug 29 13:17:13 fedora kernel:  _nv004292rm+0xd/0x20 [nvidia]
Aug 29 13:17:13 fedora kernel:  _nv006122rm+0x1b/0xb0 [nvidia]
Aug 29 13:17:13 fedora kernel:  _nv018080rm+0x59c/0x680 [nvidia]
Aug 29 13:17:13 fedora kernel:  _nv048181rm+0xb3/0xe0 [nvidia]
Aug 29 13:17:13 fedora kernel:  _nv049956rm+0xb0/0x180 [nvidia]
Aug 29 13:17:13 fedora kernel:  _nv049955rm+0x50b/0x660 [nvidia]
Aug 29 13:17:13 fedora kernel:  _nv048073rm+0xdd/0x190 [nvidia]
Aug 29 13:17:13 fedora kernel:  _nv048074rm+0x41/0x70 [nvidia]
Aug 29 13:17:13 fedora kernel:  _nv000587rm+0x4a/0x60 [nvidia]
Aug 29 13:17:13 fedora kernel:  _nv000745rm+0x20f/0xe00 [nvidia]
Aug 29 13:17:13 fedora kernel:  rm_ioctl+0x7f/0x400 [nvidia]
Aug 29 13:17:13 fedora kernel:  nvidia_unlocked_ioctl+0x53b/0x8d0 [nvidia]
Aug 29 13:17:13 fedora kernel:  __x64_sys_ioctl+0x94/0xd0
Aug 29 13:17:13 fedora kernel:  do_syscall_64+0x82/0x160
Aug 29 13:17:13 fedora kernel:  entry_SYSCALL_64_after_hwframe+0x76/0x7e
Aug 29 13:17:13 fedora kernel: 
Aug 29 13:17:13 fedora kernel: CPU: 15 PID: 3668 Comm: gnome-shell Tainted: P    B   W  OE      6.10.6-200.fc40.x86_64 #1

Please give specific steps/commands if I can provide more info. Also, hopefully others that know what they are doing/have experience of helping NVIDIA with bugs/reports can chime in!

Hi eleonorac2,
My laptop with 3070Ti used to reboot randomly. I’ve disabled PCIe Link State Power Management. While I’m still getting the same messages as you, there are no reboots for a few days.

I’ve used the following guidance for my laptop

Edit the /etc/UPower/UPower.conf file. Add the following line to the [PCIeLinkStatePowerManagement] section:
Enabled=false

Then, restart the UPower service to apply the changes:
sudo systemctl restart upower

Forum thread with the details about this error on Windows:

1 Like

@Yury_RX16 - that is very thoughtful, thank you!!
E486: Pattern not found: PC in my /etc/UPower/UPower.conf, but I can experiment with adding a [PCIeLinkStatePowerManagement] section and setting Enabled=false, as this may also (very naively optimistic probably) help with the external monitor blackouts, but I first want to give it a day or 2 to see how things are with 560.35.03 / 6.10.7-200.fc40.x86_64, as we have not yet crashed today.

1 Like

Yes, I’ve just added this text in the end of UPower.conf
[PCIeLinkStatePowerManagement]
Enabled=false

I run Fedora 40, and if your system is different, there can be another recommended way to disable PCIeLinkStatePowerManagement.

but I first want to give it a day or 2 to see how things are with
Agree, moving step by step.

1 Like

… OK, I will report back once I make that edit (and let the system run for a bit under usual conditions) as I am indeed also on Fedora 40, and will probably move to 41 as soon as possible next month(?).

1 Like

use-after-free bugs still appearing but crash/reboot frequency is down since 6.10.6-200.fc40.x86_64/560.35.03 (now 6.10.9-200.fc40.x86_64), and external monitor blackouts still happening.
Tried the suggested

[PCIeLinkStatePowerManagement]
Enabled=false

for a couple of days, which does not seem to have affected any of the above, but , does not seem to have done any harm elsewhere, either. I will probably keep UPower.conf as is for now.

Seeing same errors albeit no reboots but I am noticing my keyboard is no longer lighting up (my laptop uses nvidia_wmi_ec_backlight)

System:
  Host: fedora Kernel: 6.10.9-200.fc40.x86_64 arch: x86_64 bits: 64
  Console: pty pts/2 Distro: Fedora Linux 40 (Workstation Edition)
Machine:
  Type: Laptop System: Razer product: Blade 15 (2022) - RZ09-0421 v: 8.04
    serial: BY2222M73501760
  Mobo: Razer model: CH580 v: 4 serial: N/A UEFI: Razer v: 2.06
    date: 11/01/2023

Kernel Logs:

[14751.216226] ==================================================================
[14751.216231] BUG: KFENCE: use-after-free read in nv_dma_release_sgt+0x29/0x70 [nvidia]

[14751.216628] Use-after-free read at 0x0000000004a4c21d (in kfence-#9):
[14751.216631]  nv_dma_release_sgt+0x29/0x70 [nvidia]
[14751.216972]  _nv043722rm+0x67/0xd0 [nvidia]
[14751.217391]  _nv038493rm+0xc7/0x430 [nvidia]
[14751.218092]  _nv038458rm+0x6b/0x130 [nvidia]
[14751.218479]  _nv004292rm+0xd/0x20 [nvidia]
[14751.218987]  _nv006122rm+0x1b/0xb0 [nvidia]
[14751.219486]  _nv018080rm+0x59c/0x680 [nvidia]
[14751.219989]  _nv048181rm+0xb3/0xe0 [nvidia]
[14751.220386]  _nv049956rm+0xb0/0x180 [nvidia]
[14751.220891]  _nv049955rm+0x50b/0x660 [nvidia]
[14751.221394]  _nv048073rm+0xdd/0x190 [nvidia]
[14751.221803]  _nv048074rm+0x41/0x70 [nvidia]
[14751.222193]  _nv000587rm+0x4a/0x60 [nvidia]
[14751.222609]  _nv000745rm+0x20f/0xe00 [nvidia]
[14751.223046]  rm_ioctl+0x7f/0x400 [nvidia]
[14751.223463]  nvidia_unlocked_ioctl+0x53b/0x8d0 [nvidia]
[14751.223818]  __x64_sys_ioctl+0x94/0xd0
[14751.223823]  do_syscall_64+0x82/0x160
[14751.223826]  entry_SYSCALL_64_after_hwframe+0x76/0x7e

[14751.223830] kfence-#9: 0x000000001d17d1b8-0x00000000328f1d2e, size=384, cache=kmalloc-rnd-15-512

[14751.223833] allocated by task 3116 on cpu 12 at 14751.177449s:
[14751.223839]  nv_drm_gem_prime_import_sg_table+0x2d/0xb0 [nvidia_drm]
[14751.223849]  drm_gem_prime_import_dev+0x93/0x180
[14751.223853]  drm_gem_prime_fd_to_handle+0xe7/0x220
[14751.223856]  drm_ioctl_kernel+0xb0/0x100
[14751.223859]  drm_ioctl+0x28b/0x540
[14751.223861]  __x64_sys_ioctl+0x94/0xd0
[14751.223864]  do_syscall_64+0x82/0x160
[14751.223866]  entry_SYSCALL_64_after_hwframe+0x76/0x7e

[14751.223869] freed by task 3116 on cpu 15 at 14751.216215s:
[14751.223873]  nv_dma_release_sgt+0x49/0x70 [nvidia]
[14751.224222]  _nv043722rm+0x67/0xd0 [nvidia]
[14751.224637]  _nv038493rm+0xc7/0x430 [nvidia]
[14751.225332]  _nv038458rm+0x6b/0x130 [nvidia]
[14751.225739]  _nv004292rm+0xd/0x20 [nvidia]
[14751.226251]  _nv006122rm+0x1b/0xb0 [nvidia]
[14751.226757]  _nv018080rm+0x59c/0x680 [nvidia]
[14751.227261]  _nv048181rm+0xb3/0xe0 [nvidia]
[14751.227663]  _nv049956rm+0xb0/0x180 [nvidia]
[14751.228168]  _nv049955rm+0x50b/0x660 [nvidia]
[14751.228675]  _nv048073rm+0xdd/0x190 [nvidia]
[14751.229084]  _nv048074rm+0x41/0x70 [nvidia]
[14751.229480]  _nv000587rm+0x4a/0x60 [nvidia]
[14751.229879]  _nv000745rm+0x20f/0xe00 [nvidia]
[14751.230306]  rm_ioctl+0x7f/0x400 [nvidia]
[14751.230722]  nvidia_unlocked_ioctl+0x53b/0x8d0 [nvidia]
[14751.231072]  __x64_sys_ioctl+0x94/0xd0
[14751.231076]  do_syscall_64+0x82/0x160
[14751.231079]  entry_SYSCALL_64_after_hwframe+0x76/0x7e

[14751.231083] CPU: 15 PID: 3116 Comm: gnome-shell Tainted: P    B   W  O       6.10.9-200.fc40.x86_64 #1
[14751.231086] Hardware name: Razer Blade 15 (2022) - RZ09-0421/CH580, BIOS 2.06 11/01/2023
[14751.231088] ==================================================================

nvidia-bug-report.log.gz (1.8 MB)

1 Like

I had no idea keyboard lighting was possibly related to nvidia!
When I have problems with keyboard not lighting up, and the function key to control it not doing anything, you probably know this, but I find that the gnome quick-menu (I think - the thing that pops up when you click the speaker/battery/network etc symbols) has a keyboard lighting control, and that works, and somehow kicks the function key back into action.

ah disregard, my keyboard is not controlled by nvidia_wmi_ec_backlight, but I did notice the keylights turning off.
hopefully NVIDIA can drill down on this kfence error