Kernel panic 5.10.52-gentoo and nvidia-drivers 470.57.02 when closing hw accelerated apps

  • GPU: EVGA nvidia GTX 1060
  • OS: Gentoo
  • nvidia-drivers: 470.57.02
  • kernel: 5.10.52-gentoo
  • Systemboard: Asus Prime B350 - Plus
  • CPU: Ryzen 1700

Closing a chrome/chromium based app causes a kernel general protection fault dpaste: nvidia_recursive_fault.log
and locks up the system due to “Fixing recursive fault”

[Mon Aug  9 14:04:55 2021] CPU: 6 PID: 16198 Comm: chrome Tainted: P        W  O      5.10.52-gentoo #1
[Mon Aug  9 14:04:55 2021] Hardware name: System manufacturer System Product Name/PRIME B350-PLUS, BIOS 0902 09/08/2017
[Mon Aug  9 14:04:55 2021] RIP: 0010:_nv035844rm+0xb0/0xe0 [nvidia]
[Mon Aug  9 14:04:55 2021] Code: 89 c2 48 89 ef 48 8d b1 48 01 00 00 4c 89 e9 e8 a6 5b ff ff 66 0f 1f 44 00 00 48 89 ef e8 08 5c ff ff 84 c0 74 8a 48 8b 75 00 <48> 39 5e 08 75 ea 4c 39 26 75 e5 49 8b 44 24 20 48 8d b8 48 01 00
[Mon Aug  9 14:04:55 2021] RSP: 0018:ffffa4db83d1bbf0 EFLAGS: 00010202
[Mon Aug  9 14:04:55 2021] RAX: 0000000000000001 RBX: ffff8ef1e11d1c30 RCX: ffff8ef0016ea978
[Mon Aug  9 14:04:55 2021] RDX: 6b6b6b6b6b6b6b6b RSI: 6b6b6b6b00000000 RDI: ffff8eef8acc2d28
[Mon Aug  9 14:04:55 2021] RBP: ffff8eef8acc2d28 R08: 0000000000000020 R09: ffff8eef8acc2d30
[Mon Aug  9 14:04:55 2021] R10: 0000000000000000 R11: 0000000000000017 R12: ffff8ef0f08c6c00
[Mon Aug  9 14:04:55 2021] R13: 6b6b6b6b00000000 R14: ffff8eef8acc2da0 R15: ffff8ef1e11d1c30
[Mon Aug  9 14:04:55 2021] FS:  00007ffb75a06240(0000) GS:ffff8ef5bef80000(0000) knlGS:0000000000000000
[Mon Aug  9 14:04:55 2021] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[Mon Aug  9 14:04:55 2021] CR2: 000007e1121a2000 CR3: 000000051d60a000 CR4: 00000000003506e0
[Mon Aug  9 14:04:55 2021] Call Trace:
[Mon Aug  9 14:04:55 2021]  ? _nv014655rm+0x2ee/0x770 [nvidia]
[Mon Aug  9 14:04:55 2021]  ? _nv037695rm+0xb3/0x150 [nvidia]
[Mon Aug  9 14:04:55 2021]  ? _nv037694rm+0x297/0x4e0 [nvidia]
[Mon Aug  9 14:04:55 2021]  ? _nv037689rm+0x60/0x70 [nvidia]
[Mon Aug  9 14:04:55 2021]  ? _nv037690rm+0x7b/0xb0 [nvidia]
[Mon Aug  9 14:04:55 2021]  ? _nv036056rm+0x40/0xe0 [nvidia]
[Mon Aug  9 14:04:55 2021]  ? _nv000699rm+0x68/0x80 [nvidia]
[Mon Aug  9 14:04:55 2021]  ? rm_cleanup_file_private+0xea/0x160 [nvidia]
[Mon Aug  9 14:04:55 2021]  ? xfs_bmapi_read+0xe5/0x2c0
[Mon Aug  9 14:04:55 2021]  ? nvidia_dev_put+0xa2f/0xbf0 [nvidia]
[Mon Aug  9 14:04:55 2021]  ? nvidia_frontend_close+0x2b/0x50 [nvidia]
[Mon Aug  9 14:04:55 2021]  ? __fput+0x8e/0x230
[Mon Aug  9 14:04:55 2021]  ? task_work_run+0x5c/0x90
[Mon Aug  9 14:04:55 2021]  ? do_exit+0x350/0x9e0
[Mon Aug  9 14:04:55 2021]  ? do_group_exit+0x33/0xa0
[Mon Aug  9 14:04:55 2021]  ? __x64_sys_exit_group+0x14/0x20
[Mon Aug  9 14:04:55 2021]  ? do_syscall_64+0x33/0x40
[Mon Aug  9 14:04:55 2021]  ? entry_SYSCALL_64_after_hwframe+0x44/0xa9

nvidia-bug-report.log.gz (226.5 KB)
bug-report was run after a downgrade to 460.91.03

Are you enabling slub_debug=P or similar on your kernel’s command line? Or alternatively CONFIG_SLUB_DEBUG_ON in the kernel.

Combined with >=465.31 (or around there), this started causing issues as far as I’m aware and was able to reproduce. All I can suggest is to stop using that at the moment if so.

1 Like

changed CONFIG_SLUB_DEBUG_ON to n, rebuild and re-installed the latest nvidia driver.

slub debug is supposed to guard against corruption of the free store…
useful when running virtual machines.

Do you know if or when it would be possible to use it again?