Obvious Memory Access Error in 460.x driver (Patch provided)

cooldavid · March 20, 2021, 4:17am

Issue:
Mapping IO(Physical) address 0xF8610000 causes system reboot due to:
BUG: unable to handle kernel paging request at 00000000f8610000

It requires setting up netconsole and “sysctl kernel.panic_on_oops=0” to get the error message.
No way to run bug-report.sh while kernel hit oops…

But the fix is quiet strait forward:
By looking up “RIP: 0010:os_lookup_user_io_memory+0x3e1” in the Opps message with GDB and nvidia.ko
I’ve got the location of the issue: nvidia/os-mlock.c:59

I believe the original intention was to check if mapping physical/bus address is contiguous.
But (*pte_array)[i] or (*pte_array)[i-1] means to access the Physical Address as Kernel space virtual address…
Although the type of pte_array is “NvU64 **” but it’s value was assign with:
“pte_array[i] = (NvU64 *)(pfn << PAGE_SHIFT);”

Each element of pte_array is actually a PHYSICAL/BUS address (a 64bits integer) cast as (NvU64* [Pointer of 64bits integer]), which SHOULD NOT be dereference directly…

nvidia-460-fix-invalid-memory-access.patch (468 Bytes)

cooldavid · March 20, 2021, 5:32am

Put it more simple:
(*pte_array)[i] means: pte_array[0][i]
and
(*pte_array)[i - 1] means: pte_array[0][i - 1]

Each iteration of the loop assigns pte_array[i]
comparing pte_array[0][i] and pte_array[0][i - 1] is an obvious error.

aplattner · March 22, 2021, 10:09pm

Thanks for reporting this. We’re tracking it in internal bug number 3280454. While the bug tracker isn’t public, you can use this number to refer to this issue in future correspondence.

cooldavid · April 17, 2021, 3:43am

I’ve noticed that there are some new versions of Linux drivers released a few days ago, but the issue is still there. Is there an expected time that this issue would be addressed?

Thanks.

Latest Production Branch Version: 460.73.01
Latest New Feature Branch Version: 465.24.02

Topic		Replies	Views
Driver 515.65.01 BUG: unable to handle kernel paging request at 0000000050769420 Linux	0	466	September 29, 2022
CentOS 8: Kernel oops when running nvidia-smi or nvidia-persistenced on Nvidia A2 Linux	7	697	May 11, 2022
Bug on Fedora 32 and GTX 1060 (Driver 450 and 455) Linux	0	661	October 17, 2020
CUDA Driver 460.32.03 produces "unexpected DMA address compression" kernel error on PPC64 CUDA Setup and Installation	0	498	July 13, 2021
Kernel NULL Pointer Dereference 346.46 kernel 3.13.0 Linux	3	1495	May 30, 2015
BAR0 is 0M @ 0x0 (PCI:0000:01:00.0) Linux kernel	5	1331	September 22, 2023
Kernel error with 340.46 and kernel version 3.16.5 (and some previous) on GeForce GTS 360M Linux	1	817	November 14, 2014
JetPack 5.1.2 Kernel Panic While Reboot on Orin Jetson AGX Orin boot	2	384	September 11, 2023
Error in kernel boot:" kernel BUG at arch/x86/kernel/traps.c:252! " Linux boot , kernel	2	1374	July 14, 2022
unable to handle kernel paging request at ffffb1f144f52090 NVIDIA Virtual GPU Drivers	1	2743	October 21, 2019

Obvious Memory Access Error in 460.x driver (Patch provided)

Related topics