NX Boot failed

Hi all,
We have used nx production board on our carrier board.
Some time boot successfully, however some times we find it is hung on:
boot_fail.log (33.0 KB)

[    0.000000] Booting Linux on physical CPU 0x0
[    0.000000] Linux version 4.9.201-tegra (git@CI_Server) (gcc version 7.3.1 20180425 [linaro-7.3-2018.05 revision d29120a424ecfbc167ef90065c0eeb7f91977701] (Linaro GCC 7.3-2018.05) ) #1 SMP PREEMPT Mon1
[    0.000000] Boot CPU: AArch64 Processor [4e0f0040]
[    0.000000] OF: fdt:memory scan node memory, reg size 48,
[    0.000000] OF: fdt: - 80000000 ,  2c000000
[    0.000000] OF: fdt: - ac200000 ,  44800000
[    0.000000] OF: fdt: - 100000000 ,  180000000
[    0.000000] earlycon: tegra_comb_uart0 at MMIO32 0x000000000c168000 (options '')
[    0.000000] bootconsole [tegra_comb_uart0] enabled
[    0.000000] Found tegra_fbmem: 00800000@a06a0000
[    0.000000] Found lut_mem: 00002008@a069a000
��WARNING: pll_d2 has no dyn ramp
WARNING: at platform/drivers/pg/pg-gpu-t194.c:185
WARNING: at platform/drivers/pg/pg-gpu-t194.c:185
��[    3.264186] cgroup: cgroup2: unknown option "nsdelegate"
��WARNING: at platform/drivers/pg/pg-gpu-t194.c:185
WARNING: at platform/drivers/pg/pg-gpu-t194.c:185
��[    4.637153] random: crng init done
[    4.637296] random: 7 urandom warning(s) missed due to ratelimiting
[    4.793541] esw_tca6408_drv: loading out-of-tree module taints kernel.
[    4.817400] using random self ethernet address
[    4.817561] using random host ethernet address
[    5.283898] lt6911uxc 0-002b: probing lt6911uxc v4l2 sensor at addr 0x2b
[    5.284814] lt6911uxc 0-002b: mclk absent,assuming sensor driven externally
[    5.289526] lt6911uxc 0-002b: detected lt6911uxc sensor
[    5.290435] lt6911uxc 1-002b: probing lt6911uxc v4l2 sensor at addr 0x2b
[    5.291128] lt6911uxc 1-002b: mclk absent,assuming sensor driven externally
[    5.292293] lt6911uxc 1-002b: couldn't create debugfs
[    5.296242] lt6911uxc 1-002b: detected lt6911uxc sensor
[    5.296763] lt6911uxc 8-002b: probing lt6911uxc v4l2 sensor at addr 0x2b
[    5.296999] lt6911uxc 8-002b: mclk absent,assuming sensor driven externally
[    5.303768] lt6911uxc 8-002b: couldn't create debugfs
[    5.329829] lt6911uxc 8-002b: detected lt6911uxc sensor
[    5.468199] using random self ethernet address
[    5.468367] using random host ethernet address
[    5.477973] Bridge firewalling registered
[    6.748280] Unable to handle kernel NULL pointer dereference at virtual address 0000011f
[    6.748533] Mem abort info:
[    6.749685]   ESR = 0x96000005
[    6.749811]   Exception class = DABT (current EL), IL = 32 bits
[    6.749982]   SET = 0, FnV = 0
[    6.750072]   EA = 0, S1PTW = 0
[    6.750161] Data abort info:
[    6.750250]   ISV = 0, ISS = 0x00000005
[    6.750357]   CM = 0, WnR = 0
[    6.750443] user pgtable: 4k pages, 39-bit VAs, pgd = ffffffc1dfd14000
[    6.751174] [000000000000011f] *pgd=0000000000000000, *pud=0000000000000000
[    6.751464] Internal error: Oops: 96000005 [#1] PREEMPT SMP
[    6.751636] Modules linked in: cp210x zram esw_bd(O) esw_okr(O) br_netfilter esw_lt6911gpio5_drv(O) overlay esw_lt6911uxc_drv(O) at24 esw_tca6408_drv(O) spidev userspace_alert nvgpu bluedroid_pm ip_tas
[    6.752389] CPU: 2 PID: 3541 Comm: systemd-udevd Tainted: G           O    4.9.201-tegra #1
[    6.752392] Hardware name: NVIDIA Jetson Xavier NX Developer Kit (DT)
[    6.752397] task: ffffffc1db120000 task.stack: ffffffc1dc86c000
[    6.752412] PC is at kernfs_find_ns+0x60/0x110
[    6.752417] LR is at kernfs_find_ns+0x54/0x110
[    6.752421] pc : [<ffffff80082f0be0>] lr : [<ffffff80082f0bd4>] pstate: 80400145
[    6.752423] sp : ffffffc1dc86fb70
[    6.752431] x29: ffffffc1dc86fb70 x28: ffffffc1db120000 
[    6.752438] x27: ffffff8008f72000 x26: 0000000000000000 
[    6.752445] x25: ffffffc1dc86fd48 x24: 0000000000004000 
[    6.752452] x23: ffffffc1cae20638 x22: 0000000000000000 
[    6.752459] x21: 0000000024644567 x20: 00000000000000e7 
[    6.752465] x19: 00000000000000ff x18: 0000000000000000 
[    6.752472] x17: 0000007fb3c965c0 x16: ffffff8008262ae8 
[    6.752479] x15: 0000000000000010 x14: 0000000000000015 
[    6.752485] x13: 0000000000000000 x12: 0000000000000030 
[    6.752492] x11: 0101010101010101 x10: 7f7f7f7f7f7f7f7f 
[    6.752510] x9 : feff716475687163 x8 : ffffffffffffffff 
[    6.752520] x7 : fefefefefefefefe x6 : 0000000000008080 
[    6.752527] x5 : 0000000000000000 x4 : ffffffffffffffff 
[    6.752534] x3 : 0000000000000007 x2 : ffffffc1cae2063e 
[    6.752540] x1 : 000000007ffffffe x0 : 0000000024acbac5 
[    6.752542] 
[    6.752546] Process systemd-udevd (pid: 3541, stack limit = 0xffffffc1dc86c000)
[    6.752549] Call trace:
[    6.752556] [<ffffff80082f0be0>] kernfs_find_ns+0x60/0x110
[    6.752564] [<ffffff80082f0d50>] kernfs_iop_lookup+0x58/0xd0
[    6.752579] [<ffffff80082686e8>] lookup_slow+0x98/0x160
[    6.752585] [<ffffff800826bee8>] walk_component+0x1b8/0x2f8
[    6.752590] [<ffffff800826c664>] path_lookupat+0x9c/0x148
[    6.752596] [<ffffff800826e7d8>] filename_lookup+0x88/0x150
[    6.752601] [<ffffff800826e9b8>] user_path_at_empty+0x58/0x70
[    6.752606] [<ffffff8008262b48>] SyS_readlinkat+0x60/0x170
[    6.752613] [<ffffff800808395c>] __sys_trace_return+0x0/0x4
[    6.752619] ---[ end trace 522ac2d007d32a81 ]---
[    6.810632] usb 1-3.2.4: new low-speed USB device number 17 using tegra-xusb
[    6.840383] usb 1-3.2.4: New USB device found, idVendor=04f2, idProduct=1516
[    6.840390] usb 1-3.2.4: New USB device strings: Mfr=1, Product=2, SerialNumber=0
[    6.840394] usb 1-3.2.4: Product: USB Keyboard
[    6.840397] usb 1-3.2.4: Manufacturer: Chicony 
[    8.731900] eqos 2490000.ether_qos eth0: Link is Up - 1Gbps/Full - flow control rx/tx
[    8.735682] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[   20.634866] FAN rising trip_level:4 cur_temp:66800 trip_temps[5]:140000
[   34.138675] vdd-sdmmc1-sw: disabling
[   34.141472] vdd-1v8-sd: disabling
[   34.144169] vdd-1v8-cvb: disabling
[   34.146264] vdd-epb-1v0: disabling
[   34.148327] avdd-cam-2v8: disabling
[   34.150368] vdd-fan: disabling
[   34.152435] vdd_sys_en: disabling

Do you have any ideas?

Thanks

Can you check if the hang always happened after driver lt6911uxc gets called?

Hi Wayne,
We get another two log files
boot_error_01.log (97.1 KB)
boot_error_02.log (76.4 KB)
Hope above logs will help.
Thanks.

No idea. Maybe you remove all the devices from your board, add them one by one and see which one is causing the panic.

We ported lt6911uxc driver to NX from AGX, also we have tested that more then 200 times. We found everything was fine when lt6911uxc driver was disabled, however, once we enable lt6911uxc driver, sometimes our boards boot fail as these logs our colleage YHuang0915 have pushed to you. Any suggestion will be aprreciated.

You can check if the system is always crashed after “couldn’t create debugfs” from lt6911uxc driver.

If that is true, contact the vendor and report this to them.

We found the root cause and fixed it.
There is a local variable-debugfs_name in function camera_common_initialize() in file camera_common.c:
char debugfs_name[10].
Therefore, debugfs_name is limited to less than 10 bytes. However, our lt6911uxc driver passes a dev_name parameter to camera_common_initialize(), and this dev_name(lt6911uxc) has 9 bytes. Additinally, in camera_common_initialize() you add 2 bytes to dev_name by sprintf(debugfs_name, “%s_%c”, dev_name, s_data->csi_port + ‘a’) , than dev_name + “_a(or _b,_c, whatever)” are copyed to debugfs_name, you could see that debugfs_name which is “lt6911uxc_a” has 11 bytes, it is more than 10 bytes. So we came to array bounds errors.

企业微信截图_16281273489392

Actually, you have defined the length of dev_name in struct tegracam_device in file tegracam_core.h.

InsertPic_1C12(08-06-14-28-02)

So, I guess you need to change “char debugfs_name[10]” to “char debugfs_name[64]” in function camera_common_initialize() in file camera_common.c.

Note:
tegracam_core.c and camera_common.c are located in …/linux/Linux_for_Tegra/source/public/kernel/nvidia/drivers/media/platform/tegra/camera/;
tegracam_core.h is loacted in …/linux/Linux_for_Tegra/source/public/kernel/nvidia/include/media/.

@fanjiaheng
Thanks for your input will modify it in future release.

Thanks