Hi, We are developing our carrier board .In this process, sometimes we meet the boot hang issue during the reboot stress test. Our carrier board is refer the Xavier Devkit designed (p3509-0000). Try to figure out this issue may not depend on our SW changes. So we build the test image from SDK Manager download source (Jetpack 4.6 rev3) and placed 3 jetson nano emmc SOM (P3448-0002) with our carrier boards to do the reboot stress test at the same time. The test image source is BSP only and without install any Jetson SDK componets (ex: CUDA , Container .). We use the following command to create the EMMC image and finished the initial setup wizard after flash image , then we placed our tool to do the reboot stress test, the steps as following:
- sudo ./apply_binaries.sh
- sudo BOARDID=3448 BOARDSKU=0002 FAB=300 FUSELEVEL=fuselevel_production ./nvmassflashgen.sh jetson-nano-emmc mmcblk0p1
- cd bootloader/mfi_jetson-nano-emmc/ && sudo ./nvmflash.sh --showlogs
- plug HDMI monitor and boot up device then finished the setup wizard
- remove âquietâ from /boot/extlinux/extlinux.conf for enable more console log
- set up our reboot stress tool
- a. create the reboot_test.sh
times=$(grep -r âreboot_timesâ /etc/reboot_times.txt | awk â{print $3}â)
case â$1â in
start)
((times+=1))
sleep 20
echo "reboot_times = "$times | sudo tee /etc/reboot_times.txt
systemctl reboot
;;
stop)
echo âStopping reboot_testâ
;;
*)
echo âUsage: /etc/init.d/reboot_test.sh {start|stop}â
exit 1
;;
esac
exit 0
-
b. copy reboot_test.sh under /etc/init.d/
-
c . sudo update-rc.d reboot_test.sh defaults
sudo update-rc.d reboot_test.sh enable
sync
sleep 5
sudo systemctl reboot
- start reboot stress test
We found one device (SOM#21) have boot freeze sisuation at the 127 times, and other SOM running over 1000 times. When we power off that device(SOM#21) and to do the reboot test again. It can run over 1000 times again . I think this issue is randomly happened. The logs as following and also upload the full console log:
[ 3.599483] [drm] Initialized
[ 3.626027] brd: module loaded
[ 3.629840] tegradc tegradc.0: fb registered
[ 3.637263] loop: module loaded
[ 3.640560] tegra_profiler: version: 1.145, samples/io: 49/28
[ 3.640964] tegradc tegradc.0: DC initialized, skipping tegra_dc_program_mode.
[ 3.641010] tegradc tegradc.0: hdmi: tmds rate:148351K prod-setting:prod_c_hdmi_75m_150m
[ 3.641511] tegradc tegradc.0: hdmi: get RGB quant from REG programmed by BL.
[ 3.641520] tegradc tegradc.0: hdmi: get YCC quant from REG programmed by BL.
[ 3.667646] extcon-disp-state extcon:disp-state: cable 47 state 1
[ 3.667650] Extcon AUX1(HDMI) enable
[ 3.668661] tegradc tegradc.1: disp1 connected to head1->/host1x/sor
[ 3.668737] tegradc tegradc.1: No lt-data, using default setting
[ 3.668763] tegradc tegradc.1: No hpd-gpio in DT
[ 3.668795] tegradc tegradc.1: DT parsed successfully
[ 3.668853] tegradc tegradc.1: Display dc.ffffff800d540000 registered with id=1
[ 3.670293] tegradc tegradc.1: dpd enable lookup fail:-19
[ 3.674675] tegradc tegradc.1: probed
[ 3.696525] tegradc tegradc.0: nominal-pclk:148351648 parent:148350781 div:1.0 pclk:148350781 146868084~161703244
[ 3.734482] tegradc tegradc.1: fb registered
[ 3.734574] tegra_profiler: auth: init
[ 3.735161] THERMAL EST: found 2 subdevs
[ 3.735165] THERMAL EST num_resources: 0
[ 3.735169] [THERMAL EST subdev 0]
[ 3.735173] [THERMAL EST subdev 1]
[ 3.735497] thermal thermal_zone5: Registering thermal zone thermal_zone5 for type thermal-fan-est
[ 3.735499] THERMAL EST: thz register success.
[ 3.735610] THERMAL EST: end of probe, return err: 0
[ 3.736312] sd: No Scsi addr parsed to reserve index
[ 3.736338] hisi_sas: driver version v1.6
[ 3.745682] libphy: Fixed MDIO Bus: probed
[ 3.746247] tun: Universal TUN/TAP
l4t32.6.1.log (10.2 MB)
Have any idea to figure out this issue is depend on SOM/ Carrier board / SW or Thernal policy ? Or have any suggestion for debug this issue ?