Kernel panic in RHEL8 after closing vscode - call trace shows nvidia's driver v470.57.02

I have a new copy of RHEL 8 that reliably kernel panics when closing Visual Studio Code. Here’s the call stack from the kernel panic:

[  238.303350] general protection fault: 0000 [#1] SMP PTI
[  238.303354] CPU: 30 PID: 6266 Comm: code Kdump: loaded Tainted: P           OE    --------- -  - 4.18.0-305.19.1.el8_4.x86_64 #1
[  238.303355] Hardware name: Dell Inc. Precision 7920 Tower/0RN4PJ, BIOS 2.12.0 01/15/2021
[  238.303582] RIP: 0010:_nv035844rm+0xb0/0xe0 [nvidia]
[  238.303584] Code: 89 c2 48 89 ef 48 8d b1 48 01 00 00 4c 89 e9 e8 a6 5b ff ff 66 0f 1f 44 00 00 48 89 ef e8 08 5c ff ff 84 c0 74 8a 48 8b 75 00 <48> 39 5e 08 75 ea 4c 39 26 75 e5 49 8b 44 24 20 48 8d b8 48 01 00
[  238.303585] RSP: 0018:ffffb9b1488dbba8 EFLAGS: 00010202
[  238.303587] RAX: 0000000000000001 RBX: ffff92e121dd6830 RCX: ffff92e904f2c978
[  238.303587] RDX: 6b6b6b6b6b6b6b6b RSI: 6b6b6b6b00000000 RDI: ffff92e91c222d20
[  238.303588] RBP: ffff92e91c222d20 R08: 0000000000000020 R09: ffff92e91c222d28
[  238.303589] R10: 0000000000000000 R11: fffff365a0dbd048 R12: ffff92e154873630
[  238.303589] R13: 6b6b6b6b00000000 R14: ffff92e91c222d98 R15: ffff92e121dd6830
[  238.303590] FS:  0000000000000000(0000) GS:ffff92e160080000(0000) knlGS:0000000000000000
[  238.303591] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  238.303592] CR2: 00007fc877a259e0 CR3: 00000002c4e10003 CR4: 00000000007706e0
[  238.303592] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  238.303593] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  238.303593] PKRU: 55555554
[  238.303594] Call Trace:
[  238.303781]  ? _nv014655rm+0x2ee/0x770 [nvidia]
[  238.303924]  ? _nv037695rm+0xb3/0x150 [nvidia]
[  238.304063]  ? _nv037694rm+0x297/0x4e0 [nvidia]
[  238.304203]  ? _nv037689rm+0x60/0x70 [nvidia]
[  238.304342]  ? _nv037690rm+0x7b/0xb0 [nvidia]
[  238.304446]  ? _nv036056rm+0x40/0xe0 [nvidia]
[  238.304573]  ? _nv000699rm+0x68/0x80 [nvidia]
[  238.304696]  ? rm_cleanup_file_private+0xea/0x160 [nvidia]
[  238.304784]  ? nvidia_close+0x149/0x2c0 [nvidia]
[  238.304873]  ? nvidia_frontend_close+0x2a/0x40 [nvidia]
[  238.304877]  ? __fput+0xbe/0x250
[  238.304881]  ? task_work_run+0x8a/0xb0
[  238.304883]  ? do_exit+0x38a/0xac0
[  238.304886]  ? syscall_trace_enter+0x1d3/0x2c0
[  238.304888]  ? do_group_exit+0x3a/0xa0
[  238.304889]  ? __x64_sys_exit_group+0x14/0x20
[  238.304890]  ? do_syscall_64+0x5b/0x1a0
[  238.304893]  ? entry_SYSCALL_64_after_hwframe+0x65/0xca

Relevant details:

  • Operating System: RHEL 8
  • Driver: 470.57.02
  • CUDA Version: 11.4
  • Display Device: Quadro P1000

How to duplicate:

  1. Install RHEL 8.
  2. Install CUDA with 470.57.02 driver via the RHEL8 RPM network installer.
  3. Install latest copy of VSCode via Microsoft’s RPMs.
  4. Open VSCode.
  5. Close VSCode.
  6. Kernel panic + reboot.

Is this actually RHEL 8 or a leech clone?

True RHEL 8 – here’s the output from /etc/os-release:

NAME="Red Hat Enterprise Linux"
VERSION="8.4 (Ootpa)"
ID="rhel"
ID_LIKE="fedora"
VERSION_ID="8.4"
PLATFORM_ID="platform:el8"
PRETTY_NAME="Red Hat Enterprise Linux 8.4 (Ootpa)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:redhat:enterprise_linux:8.4:GA"
HOME_URL="https://www.redhat.com/"
DOCUMENTATION_URL="https://access.redhat.com/documentation/red_hat_enterprise_linux/8/"
BUG_REPORT_URL="https://bugzilla.redhat.com/"

Bumping this thread just to indicate that we also encountered this issue. It’s caused by slub debugging being set to poisoning by default on Rhel8.

Setting the kernel command line parameter ‘slub_debug’ equal to ‘-’ disables this debugging and I’ve been unable to replicate this issue after doing so. It appears Nvidia drivers >= 465 have this issue with slub.

  1. Within /etc/default/grub:

slub_debug=P --(CHANGE TO)–> slub_debug=-

  1. grub2-mkconfig -o /boot/efi/EFI/redhat/grub.cfg

  2. dracut -f

  3. Reboot