NVIDIA kernel message error when boot.
System:
- Linux Gentoo (Profile Clang) (As VM)
- Compiler Clang-19
- Kernel 6.12.16 (refer Gentoo Wiki for kernel mod-NVIDIA)
- NVIDIA Open Kernel 570.124.06
- GPU model: H100 PCie
- VMWare ESXi v8.0.2
Problems - NVIDIA modules built success but system crash when run nvidia-smi.
- Run nvidia-persistenced and instantaneously killed, but modprobe process/PID got 100% CPU, cannot kill/SIGKILL.
- Below dmesg output:
[ 9.271034] nvidia: loading out-of-tree module taints kernel.
[ 9.352730] BUG: kernel NULL pointer dereference, address: 0000000000000008
[ 9.353009] #PF: supervisor instruction fetch in kernel mode
[ 9.353199] #PF: error_code(0x0010) - not-present page
[ 9.353380] PGD 0 P4D 0
[ 9.353564] Oops: Oops: 0010 [#1] PREEMPT SMP NOPTI
[ 9.353745] CPU: 4 UID: 0 PID: 1294 Comm: (udev-worker) Tainted: G O 6.12.16-gentoo #9
[ 9.353927] Tainted: [O]=OOT_MODULE
[ 9.354099] Hardware name: VMware, Inc. VMware20,1/440BX Desktop Reference Platform , BIOS VMW201.00V.21805430.B64.2305221830 05/22/2023
[ 9.354282] RIP: 0010:0x8
[ 9.354464] Code: Unable to access opcode bytes at 0xffffffffffffffde.
# end with this
[ 9.370731] note: (udev-worker)[1294] exited with irqs disabled
It’s error occured from hypervisor? or
GPU state in ‘suspend’ cannot resume? or
Compiler version on kernel and modules? or anything else i miss.
Help me figure this out.
Thx