UBSAN: array-index-out-of-bounds complaints in newer kernels

odysei · November 3, 2023, 2:33pm

After upgrading from (K)Ubuntu 23.04 to (K)Ubuntu 23.10 I started getting a lot of “UBSAN: array-index-out-of-bounds” complaints from the kernel. See attached logs for both 535.113 and 535.129.
kern-535.129.log (46.4 KB)
kern-535.113.log (78.0 KB)

henrik4 · November 10, 2023, 2:09pm

I get this too. Also for the 545 driver.

imbezol · November 24, 2023, 6:41pm

I’m also getting this.

Ubuntu 23.10, Linux 6.5.0-10-generic, Nvidia driver 535.129.03-0ubuntu0.23.10.1, Quadro P400.

ubsan_dmesg.txt (167.2 KB)
lshw.txt (37.6 KB)

Relevant: The Undefined Behavior Sanitizer - UBSAN — The Linux Kernel documentation

luisalvaradox · November 30, 2023, 12:20am

Same exact thing on Ubuntu 23.10 with the 6.5.0-13-generic Kernel and the Nvidia 545.29.06 driver

[ 14.267145] ================================================================================
[ 14.267148] UBSAN: array-index-out-of-bounds in /var/lib/dkms/nvidia/545.29.06/build/nvidia-uvm/uvm_pmm_gpu.c:2364:28
[ 14.267149] index 0 is out of range for type ‘uvm_gpu_chunk_t []’
[ 14.267150] CPU: 6 PID: 2641 Comm: gst-plugin-scan Tainted: P OE 6.5.0-13-generic #13-Ubuntu
[ 14.267152] Hardware name: ASUS System Product Name/ROG MAXIMUS Z790 HERO, BIOS 1501 10/06/2023
[ 14.267153] Call Trace:
[ 14.267154]
[ 14.267156] dump_stack_lvl+0x48/0x70
[ 14.267163] dump_stack+0x10/0x20
[ 14.267164] __ubsan_handle_out_of_bounds+0xc6/0x110
[ 14.267167] split_gpu_chunk+0x13f/0x410 [nvidia_uvm]
[ 14.267200] uvm_pmm_gpu_alloc+0x2da/0x6d0 [nvidia_uvm]
[ 14.267224] phys_mem_allocate+0xac/0x230 [nvidia_uvm]
[ 14.267253] allocate_directory+0xb4/0x130 [nvidia_uvm]
[ 14.267279] ? allocate_directory+0xb4/0x130 [nvidia_uvm]
[ 14.267303] uvm_page_tree_init+0x12c/0x2e0 [nvidia_uvm]
[ 14.267329] uvm_gpu_retain_by_uuid+0x1a2b/0x2bb0 [nvidia_uvm]
[ 14.267351] uvm_va_space_register_gpu+0x47/0x740 [nvidia_uvm]
[ 14.267372] uvm_api_register_gpu+0x5a/0x90 [nvidia_uvm]
[ 14.267393] uvm_ioctl+0x1a26/0x1cd0 [nvidia_uvm]
[ 14.267411] ? ext4_inode_block_valid+0x1d/0x30
[ 14.267414] ? __ext4_ext_check+0x1ff/0x500
[ 14.267416] ? unlock_new_inode+0x55/0x70
[ 14.267417] ? __ext4_iget+0x9d1/0x1130
[ 14.267419] ? __d_add+0x118/0x1e0
[ 14.267420] ? _raw_spin_lock_irqsave+0xe/0x20
[ 14.267422] ? thread_context_non_interrupt_add+0x13a/0x2c0 [nvidia_uvm]
[ 14.267448] uvm_unlocked_ioctl_entry.part.0+0x7b/0xf0 [nvidia_uvm]
[ 14.267468] uvm_unlocked_ioctl_entry+0x6b/0x90 [nvidia_uvm]
[ 14.267487] __x64_sys_ioctl+0xa0/0xf0
[ 14.267488] do_syscall_64+0x59/0x90
[ 14.267490] ? exit_to_user_mode_prepare+0x30/0xb0
[ 14.267493] ? syscall_exit_to_user_mode+0x37/0x60
[ 14.267494] ? do_syscall_64+0x68/0x90
[ 14.267495] ? irqentry_exit+0x43/0x50
[ 14.267496] ? exc_page_fault+0x94/0x1b0
[ 14.267498] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 14.267500] RIP: 0033:0x7f83209238ef
[ 14.267525] Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 <89> c2 3d 00 f0 ff ff 77 18 48 8b 44 24 18 64 48 2b 04 25 28 00 00
[ 14.267526] RSP: 002b:00007ffcadf061c0 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[ 14.267527] RAX: ffffffffffffffda RBX: 00007f831fd12840 RCX: 00007f83209238ef
[ 14.267528] RDX: 00007ffcadf06260 RSI: 0000000000000025 RDI: 000000000000000e
[ 14.267528] RBP: 00007ffcadf062c0 R08: 00007f831fd128d0 R09: 0000000000000000
[ 14.267529] R10: 00007f83208143c0 R11: 0000000000000246 R12: 000055f14f125f76
[ 14.267529] R13: 00007f831fd128d0 R14: 00007ffcadf06260 R15: 000000000000000e
[ 14.267530]
[ 14.267531] ================================================================================

This block of code just repeats in different patterns and points to different lines of code in the nvidia file that it mentions in the beginning.

amrits · December 1, 2023, 9:26pm

We have a bug 4348950 internally filed for tracking purpose.
Issue has been already root caused and will be available in future branch release drivers.

aplattner · December 1, 2023, 10:15pm

I should note that this warning is harmless. It’s due to the wrong size being declared on some arrays in UVM. You can see it in the code here: https://github.com/NVIDIA/open-gpu-kernel-modules/blob/main/kernel-open/nvidia-uvm/uvm_pmm_gpu.c#L224

These are C “flexible array members” and should just be declared with no size.

sibi.nandhu · December 6, 2023, 8:29am

I am on Ubuntu 23.04 server and I am using Windows on VM with QEMU/KVM using GPU passthrough and I am using Nvidia A4000 GPU. Its crashing the entite host kernel at random times while playing demanding games and I tried to capture the log with kdump and this is what I got: Question #708640 “QEMU/KVM crashes with GPU Passthrough at rando...” : Questions : Ubuntu

The error I face is similar to what is said in this forum? Is the UBSAN crashing my host kernel?

This happens only when I am playing games on the windows VM. I gave 16 cores and 16G RAM to my VM. The resources are all under the limit, yet this crash happens that crashes the entire host kernel.

user28546 · December 12, 2023, 6:39am

Also seeing this

Linux data 6.5.0-14-generic #14-Ubuntu SMP PREEMPT_DYNAMIC Tue Nov 14 14:59:49 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

Driver Version: 545.29.06

luisalvaradox · December 28, 2023, 5:57pm

Thank you so much @aplattner

Topic		Replies	Views
UBSAN: array-index-out-of-bounds in /var/lib/dkms/nvidia/535.129.03/build/nvidia-uvm/uvm_pmm_gpu.c:829:45 Linux	5	3432	November 21, 2023
/dev/nvidia-uvm IO error on Ubuntu 22.04, 520 to 535 driver versions Linux cuda , opencl , linux-driver	2	3120	August 27, 2023
BUG Report: nvidia uvm null pointer dereference driver version: 535.86.05 on v6.2 kernel Drivers - Linux, Windows, MacOS nvbugs	2	625	February 22, 2024
Linux kernel vs nvidia driver version Linux	11	15145	June 5, 2024
Nvidia-drm Failed to map when waking up on Ubuntu 23.10 GPU - Hardware ubuntu	8	1239	January 10, 2024
331.20 WHQL long-term driver discussion Linux	30	14400	November 24, 2013
Freezing after unlock on Ubuntu 22.04 and above Linux ubuntu	3	1646	March 12, 2024
Nvidia driver kernel random call trace Linux	14	1701	November 24, 2024
Driver crashes after upgrade from 510.68.02 to 515.48.07 on GTX 1080ti Linux boot , kernel , nvbugs	6	1117	January 28, 2023
Ubuntu 22.04 installation driver error Nvidia[A10] Linux	4	3538	May 22, 2024

UBSAN: array-index-out-of-bounds complaints in newer kernels

Related topics