Xavier nx use 4.9.140 kernel has some issues about memory

I update xavier nx to 35.3.1,the kernel version also update to 5.10 from 4.9,but i have meet some issues,the same application, cost more cpu resource on 35.3.1,by analysis the top output, i found that kernel is the root cause, so i replace the DTB Image and modules which used in R32.4.4.
But the kernel has some errors when booting,Attached below is the kernel dmesg log

kern_log (153.0 KB)
I found some issues in the log:
1-reserved memory value 18446744073709059484K is abnormal
[ 0.000000] Memory: 7212324K/7473856K available (15292K kernel code, 2942K rwdata, 6616K rodata, 8640K init, 609K bss, 18446744073709059484K reserved, 753664K cma-reserved)

2-Some BUG occurs.

[    0.663567] ------------[ cut here ]------------
[    0.663594] WARNING: CPU: 0 PID: 1 at mm/cma.c:113 cma_init_reserved_areas+0x9c/0x1c0
[    0.663610] Modules linked in:

[    0.663636] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.9.140-tegra #2
[    0.663648] Hardware name: Unknown NVIDIA Jetson Xavier NX Developer Kit/NVIDIA Jetson Xavier NX Developer Kit, BIOS r35.0-716592d 01/04/2023
[    0.663664] task: ffffffc718060000 task.stack: ffffffc718068000
[    0.663677] PC is at cma_init_reserved_areas+0x9c/0x1c0
[    0.663688] LR is at cma_init_reserved_areas+0x88/0x1c0
[    0.663700] pc : [<ffffff8bb8211814>] lr : [<ffffff8bb8211800>] pstate: 60c00045
[    0.663713] sp : ffffffc71806bd60
[    0.663723] x29: ffffffc71806bd60 x28: ffffff8bb841d940 
[    0.663739] x27: 00000000000ce000 x26: ffffff8bb81e0f60 
[    0.663754] x25: ffffff8bb8d43000 x24: ffffff8bb8d86128 
[    0.663769] x23: 0000000000000000 x22: 0000000000000000 
[    0.663784] x21: 0000000000000150 x20: ffffff8bb8c9b4b2 
[    0.663798] x19: ffffff8bb8d86000 x18: 0000000000000000 
[    0.663813] x17: 000000000000000e x16: 0000000000000007 
[    0.663827] x15: 0000000000000012 x14: 0000000000000000 
[    0.663842] x13: 00000000013c890f x12: 0000000000000000 
[    0.663857] x11: 0000000000000004 x10: 0000000000000003 
[    0.663871] x9 : ffffff8bb8c90dd8 x8 : ffffffbf1cd8aa00 
[    0.663886] x7 : 0000000000600000 x6 : 0000000000000018 
[    0.663900] x5 : ffffff8bb8d7fc90 x4 : ffffff8bb8d7fc18 
[    0.663914] x3 : 0000000000000005 x2 : 0000000000000005 
[    0.663929] x1 : 0000000100000000 x0 : 0000000000000001 

[    0.663954] ---[ end trace 4ad7898d8620677c ]---
[    0.663965] Call trace:
[    0.663978] [<ffffff8bb8211814>] cma_init_reserved_areas+0x9c/0x1c0
[    0.663994] [<ffffff8bb6c8433c>] do_one_initcall+0x44/0x130
[    0.664010] [<ffffff8bb81f0d24>] kernel_init_freeable+0x1a0/0x244
[    0.664025] [<ffffff8bb7b5e9c8>] kernel_init+0x18/0x108
[    0.664037] [<ffffff8bb6c840a0>] ret_from_fork+0x10/0x30

You cannot use the prior device tree. There are some extreme differences between a 4.x kernel and a 5.x kernel. That device tree, to some extent, could be considered to be arguments to pass to the drivers. Those drivers are an entirely different generation. There is no reason to believe memory problems could be fixed by using the wrong device tree.

1 Like

My previous description was not correct. I replaced the kernel file, device tree file, and module file in r32.4.4 with r35.3.1. So my device tree matches the kernel, and the following is my complete Kernel logs. I suspect that the memory exception caused by writing some overlay parameters to the device tree during the cboot phase.
kern_log (153.0 KB)

It can be seen from the log below that I have replaced the kernel image and device tree corresponding to version 4.9

[    0.000000] Linux version 4.9.140-tegra (victor@victor-OptiPlex-9020)

[    0.312906] DTS File Name: /home/scm/jenkins/workspace/nx441_gluon_dailybuild/nx441_dev/kernel/kernel/kernel-4.9/arch/arm64/boot/dts/../../../../../../hardware/nvidia/platform/t19x/jakku/kernel-dts/tegra194-p3668-all-p3509-0000.dts
[    0.313023] DTB Build time: May 16 2023 14:21:23


we do not guarantee bootloader in r35 will work with kernels on r32, and please just do a re-flash to use r32 if you think there is some performance issue in r35.

Just to add context…

Part of how the kernel operates upon boot is that it takes arguments. The device tree and the kernel command line are most of that. However, the hardware setup state and other software during boot will also create an environment which is inherited by the kernel. @DaveYYY is 100% correct to suggest the boot content may not be valid when mixing and matching R32.x with R35.x. There is quite a bit of difference since older boot did not use UEFI, and newer boot does; this is complicated even more by the 4.x kernel being an entire major release different than the 5.x kernel.

I’m not really convinced that there was a memory issue to start with. I strongly suggest going back to stock and then providing debug information to determine where the memory changes are which were concerning.

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.