Can't boot from EMMC when doing reboot stress test with Jetson Linux 35.1

Hi,

We flashed the Jetson Linux 35.1 image from sdk manager into ther Xavier NX devkit + P3668-0001 NX SOM (EMMC SKU). We used the NVP Model clock configuration tool for set the power mode to “20W-6 Core” and did the reboot stress test for testing system stabiblity. While testing more than 200 times , kernel cmd line was auto changed from

[ 0.000000] Kernel command line: root=/dev/mmcblk0p1 rw rootwait rootfstype=ext4 console=ttyTCU0,115200n8 console=tty0 fbcon=map:0 net.ifnames=0

to

[ 0.000000] Kernel command line: root=/dev/initrd rw rootwait console=ttyTCU0,115200n8 fbcon=map:0 net.ifnames=0 video=tegrafb no_console_suspend=1 earlycon=tegra_comb_uart,mmio32,0x0c168000 sdhci_tegra.en_boot_part_access=1

[ 6.287818] Root device found: initrd
[ 6.289439] Mount initrd as rootfs and enter recovery mode
Finding OTA work dir on external storage devices
Checking whether device /dev/mmcblk?p1 exist
Looking for OTA work directory on the device(s): /dev/mmcblk0p1
Checking whether device /dev/sd?1 exist

This siuation caused kernel can’t switch from initrd to actual rootfs. It will always mount initrd as rootfs and enter recovery mode even we power on/off the device . Please help fix this issue , thanks
jetpack502-boot fail.log (15.8 MB)

Hi,

Thanks for reporting this issue. But could you help clean up your log to just keep the 1~2 reboot iterations near the issue happened point? No need to share us a 15MB log file. It is hard for us to check such log too.

Hi ,
I keep last 6 times test results, please help check the logs. Thanks
jetpack502-boot fail-2.log (389.5 KB)

Hi,

It seems the issue would happen after you see “L4TLauncher: Attempting Recovery Boot” in the UEFI. Could you also test this again on your side and see if this is true?

We will also check this internally at same time.

Thanks.

As you filed another topic here Root device (mmcblk0p1) was not found while doing reboot stress test with Jetson Linux 35.1 - #4 by JJ.C,

I feel this issue seems on custom board, please try to reproduce this issue on devkit first.

We notice that this behavior seems related to your kernel panic which keeps happened.

Could you resolve your kernel panic in each boot iteration first?

Hi ,

This is official image form SDK manager, we didn’t do any modification. So it’s not our modification cause kernel panic. I created another case about kernel panic beforce this case, but no response.
Kernel panic has occurred when doing reboot stress test with Jetson Linux 35.1 - Jetson & Embedded Systems / Jetson Xavier NX - NVIDIA Developer Forums

Thanks

Hi,

Is this issue on devkit or your custom board?
If this is on your custom board, then of course some kernel panic may happen because some I/O may not exist on your carrier board. You need to modify the device tree. Same rule is even applicable on jetpack4.

Better providing more information about your board first.

Can’t boot from EMMC when doing reboot stress test with Jetson Linux 35.1 - Jetson & Embedded Systems / Jetson Xavier NX - NVIDIA Developer Forums
and
Kernel panic has occurred when doing reboot stress test with Jetson Linux 35.1 - Jetson & Embedded Systems / Jetson Xavier NX - NVIDIA Developer Forums
have same kernel panic on Xavier NX devkit + P3668-0001 NX SOM. Thanks

Hi,

We did the same reboot stress test again and the issue would happen after we saw the “L4TLauncher: Attempting Recovery Boot” in the UEFI. Follow your suggestion , attache the last 10 times test logs for reference.
HW configuration is same: Xavier NX devkit + P3668-0001 NX SOM
jetpack502-3.log (731.2 KB)

Hi,

Thanks for reporting. We already started the internal check. Will update the result later.

Hi,

Have any update for this issue ? Thanks

There is a problem in HDMI driver and will cause panic. We are still checking.
And there is a mechanism to put board into recovery boot in rel-35.1. If kernel panic too many times, then it will happen.

[ 6.988564] tegradc 15200000.display: Bootloader disp_param detected. Detected mode: 8x4 (on 0x0mm) pclk=148350937

Hi,
Thanks for reply, may I know is it possible has fix in next release (Jetpack 5.1 at September, 2022)?

Plan is still unclear.

Hey @WayneWWW we observed this issue too. Do you have any solution in the pipeline? Obviously the Jetpack is not usable in production like this.

There is a workaround for this issue.

Please add this to your display kernel driver.

diff --git a/drivers/video/tegra/dc/dc.c b/drivers/video/tegra/dc/dc.c
index 1cedba9..09e1a0d 100644
--- a/drivers/video/tegra/dc/dc.c
+++ b/drivers/video/tegra/dc/dc.c
@@ -6374,7 +6374,7 @@
 		pr_debug("dc->fb_mem not initialized\n");
 		return false;
 	}
-	return (dc->fb_mem->start != 0);
+	return false;
 }
 EXPORT_SYMBOL(tegra_is_bl_display_initialized);
 
1 Like

Hey @WayneWWW, I tried the fix.
It had no influnce on the Xavier NX devkit.

When running the kernel in our custom board with this change we encounter the following:

[   19.489937] tegradc 15200000.display: dc_poll_register 0x41: timeout
[   19.489944] tegradc 15200000.display: dc timeout waiting for DC to stop
[   19.542102] tegradc 15200000.display: dc_poll_register 0x41: timeout
[   19.542109] tegradc 15200000.display: dc timeout waiting for DC to stop
[   19.593945] tegradc 15200000.display: dc_poll_register 0x41: timeout
[   19.593966] tegradc 15200000.display: timeout waiting for postcomp init state to promote
[   19.645970] tegradc 15200000.display: dc_poll_register 0x41: timeout
[   19.645979] tegradc 15200000.display: timeout waiting for win assignments to promote
[   19.645984] tegradc 15200000.display: tegra_nvdisp_head_enable, failed head enable
[   19.697947] tegradc 15210000.display: dc_poll_register 0x41: timeout
[   19.697956] tegradc 15210000.display: timeout waiting for postcomp init state to promote
[   19.749954] tegradc 15210000.display: dc_poll_register 0x41: timeout
[   19.749963] tegradc 15210000.display: timeout waiting for win assignments to promote
[   19.749967] tegradc 15210000.display: tegra_nvdisp_head_enable, failed head enable
[   19.801955] tegradc 15210000.display: dc_poll_register 0x41: timeout
[   19.801965] tegradc 15210000.display: timeout waiting for postcomp init state to promote
[   19.853968] tegradc 15210000.display: dc_poll_register 0x41: timeout
[   19.853977] tegradc 15210000.display: timeout waiting for win assignments to promote
[   19.853981] tegradc 15210000.display: tegra_nvdisp_head_enable, failed head enable

The display remains black…any idea?

Is this your only display head on the custom board?

To be honest I am not sure what’s going on with our hardware there. We have two Display Ports but when we plug in a Display it gets connected properly plus another 1024x768 HDMI Display is added, which is invisible and is next to the primary display. When we plug in a monitor on the second port it does not work at all. That seems to be a hardware issue that is going to be fixed in the next revision.

Does the patch you supplied only work under special circumstances? I reverted the patch and the display works again…