Dram scrub mode to staged cause power off issue

Hi,

I am working on the jetson-agx-xavier-industrial target. To reduce the boot time, I have changed DRAM scrub mode to staged on the mb1 cfg file bootloader/t186ref/BCT/tegra194-mb1-bct-misc-l4t-jaxi.cfg variable enable_dram_staged_scrubbing = 1. But when I power off the board using the command line, the next boot is struck at the following

Jetson UEFI firmware (version 202210.3-10b78f7c built on 2024-05-09T11:15:28+05:30)
��
[2022671] : [ LOG ] : dram_ecc_uncorrected_handler: Error Status - 0xaa, No of errors - 0x9

[2023324] : [ LOG ] : Bad page number updated in scratch - 0xb800

[2023770] : [ LOG ] : Bad page number updated in scratch - 0xb800

[2024376] : [ LOG ] : Bad page number updated in scratch - 0xb800

[2030675] : [ LOG ] : Bad page number updated in scratch - 0xb800

[2036975] : [ LOG ] : Bad page number updated in scratch - 0xb800

How to solve this issue?

Hi saaisanthosh.r,

Are you using the devkit or custom board for AGX Xavier Industrial?
What’s your Jetpack version in use?

To reduce the boot time, we would suggest starting from bootloader/kernel/rootfs/services.
We don’t suggest to modify the BCT for MB1/MB2.

I am using the devkit. I am using L4T 35.5.0 which jetpack version 5.1.3.

Please enabled the timestamp in serial console first to check the boot time in each boot stage.
Then, you can start with UEFI and remove few features not used in your case to reduce boot time.

Hi kevin,

We have already reduced time on the UEFI, kernel, and RFS. But this SDRAM scrub on MB1 is waiting around 8 seconds. Is changing the DRAM scrub is not a recommended method?

Please share the full serial console log with timestamp enabled.
We didn’t notice that there’s 8s delay in MB1.

boottime_nvidia_l4t_35.5.0_without_encryption.txt (100.6 KB)

I have attached the boot time log which was measured using L4T v 35.5.0. This boot log was without any of our customization. On the MB1, there is an SDRAM scrub process which takes approximately 8 seconds.

[0.345484 0.015106] [0000.206] I> DRAM ECC Scrub Mode: full
[0.345959 0.000475] [0000.210] I> SDRAM scrub in progress…
[8.361404 8.015445] [0008.212] I> SDRAM scrub complete …
[8.361897 0.000493] [0008.215] I> SDRAM scrub successful

I would not get this delay on my local AGX Xavier devkit.

Could you share the result of the following command on your board?

$ cat /etc/nv_tegra_release
$ cat /etc/nv_tegra_release
# R35 (release), REVISION: 5.0, GCID: 35550185, BOARD: t186ref, EABI: aarch64, DATE: Tue Feb 20 04:46:31 UTC 2024
$

Please note i am using the AGX Xavier industrial. Is this delay appears only on industrial board?

Hi Kevin,

Is there any update regarding this issue?

Hi saaisanthosh.r,

I just tested on the AGX Xavier Industrial devkit and modify the following line in Linux_for_Tegra/bootloader/t186ref/BCT/tegra194-mb1-bct-misc-l4t-jaxi.cfg with L4T R35.5.0

##### dram-ecc variables #####
- enable_dram_staged_scrubbing = 0;
+ enable_dram_staged_scrubbing = 1;
enable_dram_page_blacklisting = 1;

I can boot up the board with DRAM ECC Scrub Mode: Staged, and there’s no 8 seconds delay in MB1.
Let me share the full log as following for your reference.
jetson-agx-xavier-industrial_r35.5.0.log (72.9 KB)

Hi Kevin,

I can also boot the board after changing the scrubbing mode to staged. But the issue appears on power off the board in the command line. Please try sudo poweroff and check whether you are getting the following error on the next power-on

[2022671] : [ LOG ] : dram_ecc_uncorrected_handler: Error Status - 0xaa, No of errors - 0x9

[2023324] : [ LOG ] : Bad page number updated in scratch - 0xb800

[2023770] : [ LOG ] : Bad page number updated in scratch - 0xb800

[2024376] : [ LOG ] : Bad page number updated in scratch - 0xb800

[2030675] : [ LOG ] : Bad page number updated in scratch - 0xb800

[2036975] : [ LOG ] : Bad page number updated in scratch - 0xb800

Do you mean that you can boot up the board with sudo reboot but you would hit the errors with sudo poweroff? How did you boot up the board after running sudo poweroff?

After sudo poweroff, press the reset push button to boot.

I’ve checked this issue with internal and it seems there’re some issues in UEFI.
So, we don’t support the enabling staged scrubbing for T194 series.