Jetson Nano production module does not boot on custom carrier board, but does so on auvidea's

Do you mean there are some nano that can flash and boot up successfully on your carrier board?

All Nanos can be flashed by our carrier boards. Some Nanos (the number is significant, like 6 in 50) just don’t boot after flashing on our carrier boards. But most of them do so successfully. We need some way to re-claim the non-bootable ones as they boot on auvidea boards but not on our boards. So the idea is to look into what’s going wrong with flashing and ensuring creation of an image that results in successful boot each time on our carrier boards, and also checking if hardware can be an issue.

And we use our same custom image each time.

Yes, I understand that without console logs it is difficult. Meanwhile we are looking into obtaining those as well.

So the idea is to look into what’s going wrong with flashing and ensuring creation of an image that >results in successful boot each time on our carrier boards, and also checking if hardware can be an >issue.

I don’t think checking the flash process would help for them. The flash process would only read the eeprom value from each module and if eeprom value is not valid, the flash process would fail. It won’t not affect the systemimage.

Hi WayneWWW,

I think we are facing a similar problem. We have an custom Board as well. It works fine with the development Module but does not boot with the Production Module. The Production Module we used does boot on a carrier board from Auvidea.

Please take a look at the UART output:

[0000.321] [L4T TegraBoot] (version 00.00.2018.01-l4t-80a468da)
[0000.327] Processing in cold boot mode Bootloader 2
[0000.331] A02 Bootrom Patch rev = 1023
[0000.335] Power-up reason: pmc por
[0000.338] No Battery Present
[0000.341] pmic max77620 reset reason
[0000.344] pmic max77620 NVERC : 0x40
[0000.347] RamCode = 0
[0000.350] Platform has DDR4 type RAM
[0000.353] max77620 disabling SD1 Remote Sense
[0000.357] Setting DDR voltage to 1125mv
[0000.361] Serial Number of Pmic Max77663: 0xa12ca
[0000.369] Entering ramdump check
[0000.372] Get RamDumpCarveOut = 0x0
[0000.375] RamDumpCarveOut=0x0,  RamDumperFlag=0xe59ff3f8
[0000.380] Last reboot was clean, booting normally!
[0000.385] Sdram initialization is successful 
[0000.389] SecureOs Carveout Base=0x00000000ff800000 Size=0x00800000
[0000.395] Lp0 Carveout Base=0x00000000ff780000 Size=0x00001000
[0000.401] BpmpFw Carveout Base=0x00000000ff700000 Size=0x00080000
[0000.407] GSC1 Carveout Base=0x00000000ff600000 Size=0x00100000
[0000.413] GSC2 Carveout Base=0x00000000ff500000 Size=0x00100000
[0000.418] GSC4 Carveout Base=0x00000000ff400000 Size=0x00100000
[0000.424] GSC5 Carveout Base=0x00000000ff300000 Size=0x00100000
[0000.430] GSC3 Carveout Base=0x000000017f300000 Size=0x00d00000
[0000.446] RamDump Carveout Base=0x00000000ff280000 Size=0x00080000
[0000.452] Platform-DebugCarveout: 0
[0000.456] Nck Carveout Base=0x00000000ff080000 Size=0x00200000
[0000.461] Non secure mode, and RB not enabled.
[0000.478] Csd NumOfBlocks=0

I think the Problem is the last line because nothing was found:
[0000.478] Csd NumOfBlocks=0

UART Output of development Module that boots normal:

[0000.125] [L4T TegraBoot] (version 00.00.2018.01-l4t-80a468da)
[0000.131] Processing in cold boot mode Bootloader 2
[0000.135] A02 Bootrom Patch rev = 1023
[0000.139] Power-up reason: pmc por
[0000.142] No Battery Present
[0000.145] pmic max77620 reset reason
[0000.148] pmic max77620 NVERC : 0x40
[0000.151] RamCode = 0
[0000.154] Platform has DDR4 type RAM
[0000.157] max77620 disabling SD1 Remote Sense
[0000.161] Setting DDR voltage to 1125mv
[0000.165] Serial Number of Pmic Max77663: 0x2b31ed
[0000.173] Entering ramdump check
[0000.176] Get RamDumpCarveOut = 0x0
[0000.179] RamDumpCarveOut=0x0,  RamDumperFlag=0xe59ff3f8
[0000.184] Last reboot was clean, booting normally!
[0000.189] Sdram initialization is successful 
[0000.193] SecureOs Carveout Base=0x00000000ff800000 Size=0x00800000
[0000.199] Lp0 Carveout Base=0x00000000ff780000 Size=0x00001000
[0000.205] BpmpFw Carveout Base=0x00000000ff700000 Size=0x00080000
[0000.211] GSC1 Carveout Base=0x00000000ff600000 Size=0x00100000
[0000.216] GSC2 Carveout Base=0x00000000ff500000 Size=0x00100000
[0000.222] GSC4 Carveout Base=0x00000000ff400000 Size=0x00100000
[0000.228] GSC5 Carveout Base=0x00000000ff300000 Size=0x00100000
[0000.234] GSC3 Carveout Base=0x000000017f300000 Size=0x00d00000
[0000.250] RamDump Carveout Base=0x00000000ff280000 Size=0x00080000
[0000.256] Platform-DebugCarveout: 0
[0000.259] Nck Carveout Base=0x00000000ff080000 Size=0x00200000
[0000.265] Non secure mode, and RB not enabled.
[0000.270] Read GPT from (4:0)
[0000.398] Csd NumOfBlocks=62333952
[0000.403] Set High speed to 1

Do you have any Idea why this happens?

Could you file a new topic or describe your setup?

And could you share what kind of software is installed on the module?

Hi WayneWWW,

I was also working on the same, but am unable to fetch UART logs so far. So it would be helpful to track any related posts, by members who have the UART logs, for debugging our issue. Could I get a link to the new post here, if it is put up by reifenrath.michel.

Hi, here is the Link to the new topic:

Hi WayneWWW,

We finally managed to get the UART logs, for our custom carrier boards. Please find attached the logs for the booting and non-booting case.
UARTLogsNanoBooting (18.6 KB)
UARTLogsNanoNotBooting (18.9 KB)

It seems that for our non-booting case, we have a peculiar log:
We get a prompt for:

“Tegra210 (P3450-Porg) # MC: “

instead of

“switch to partitions #0”.

One more difference is that vdd_core voltage is set to 1125 mV in case of the booting Nano, but 1075 mV in case of the non-booting one. Is this something to worry about? Although for the non-booting case, we have also seen 1125mV being set as the vdd_core voltage, in some other logs that we have.

Can you please advice regarding this?

Hi WayneWWW,

Do you have an update on this yet?

hello jetson_user,

it looks you stuck at u-boot stage due to bad device.

U-Boot 2016.07-gd917e08cec (Jul 16 2019 - 16:52:59 -0700)

TEGRA210
Model: NVIDIA P3450-Porg
Board: NVIDIA P3450-PORG
DRAM:  4 GiB
MMC:   Tegra SD/MMC: 0, Tegra SD/MMC: 1
Using default environment

In:    serial
Out:   serial
Err:   serial
Net:   No ethernet found.
MMC: no card present
** Bad device mmc 1 **

could you please have summarize what’s the modification you had done,
since it’s a custom carrier board, did you fully check schematic to review the board design? had you also done pinmux customization.
in addition, what’s the release image you’re now using.

Hi JerryChang,

Did you mean the changes related to software or hardware? For software, I tried with Jetpack 4.2.1, and changed the command line argument, like adding console=ttyTHS1 and early_printk, as can be followed in our previous discussion: How to obtain UART logs on Jetson Nano production modules, over UART 2 - #10 by jetson_user. We also tried with Jetpack 4.6 and 4.5.1, and still the booting is stuck, for some of the Nanos. Of course, at this moment we are able to get the UART logs. And we have resumed with Jetpack 4.2.1, because our drivers are based on that.

For hardware, I will have to get back to you with the details.

The bad device message also appears for the booting case, in the logs: UARTLogsNanoBooting. So might this still be the issue?

Hi JerryChang,

We finally have the root cause known for (some of) the Nanos not being able to boot on our custom carrier boards. It seems there was noise on pin 238 (UART2-RX) which we had left unconnected. This caused the uboot to detect a BREAK signal, apparently. On applying a pull-down resistor on this pin, the Nanos are booting.

We need to change the circuit of course, but we were wondering if there was some easy way to ignore the detection of this signal in the u-boot source code, for now? We are using L4T_R32-2_public_sources. Also could you please let us know a good way to distribute this new u-boot binary in the field Nanos? Do we need to use the encryption method as we do for signing device tree binaries and updating some partition? If this is so, could we have details on that please?

Eagerly waiting for your reply. Thanks!

Hi JerryChang,

Do you by chance have an update on this issue yet?

hello jetson_user,

sorry for late reply, I was missing this thread.
there’s u-boot to detect everything to stop booting-up. you should have resistor on this pin to avoid the detected noise.
for an alternative way, you should try disable abortboot() function or having hard-code bootdelay to skip detecting input keys.

furthermore, Nano’s u-boot binary is flashing to the LNX partition. are you able to perform flash script to update u-boot binary?

Hi JerryChang,

Thanks for the reply. We tried these two, that is keeping CONFIG_BOOTDELAY as 0, -1, -2, etc. And also commenting out the body of the abortboot() function (such that it always returned 0). Still it did not work, I mean we are getting past the stage where it checks for partitions, but the booting appears to be paused due to continuous interrupts due the noise on our pin 238. For the moment, we can only work with a software solution (so addition of pull down resistor is not feasible).

We came across a solution for TX2 here: Disable serial console of u-boot on TX2 - #32 by JDSchroeder. Do you think it is the same problem? If yes, what would be an equivalent for Nano? Like an internal pull-down or something? We cannot seem to find similar syntax, as given in the solution, in the Jetson nano boot config files.

In short, we need a way in software to completely disregard any signal on pin 238 (UART2_RX) by the uboot.

Also, a general question. We see that after building bootloader, there are four binaries generated: uoot, u-boot.bin, u-boot.dtb, u-boot-dtb.bin.

But in the path on Jetpack: Linux_for_Tegra/bootloaddloadererer/t210ref/p3450-porg/ there is originally only one file: u-boot.bin. If there were some dtb changes in bootloader, how to incorporate them, i.e., how to make use of the other three binaries generated?

Thanks in advance.

hello jetson_user,

you should update $OUT/Linux_for_Tegra/bootloader/t210ref/p3450-porg/u-boot.bin and execute the commands, $ sudo ./flash.sh -k LNX jeton-nano-emmc mmcblk0p1 to update Nano’s uboot binary file.

BTW,
please refer to Jetson Nano Boot Flow. are you able to confirm which boot stage it stuck at?
thanks

Hi JerryChang,

It appears from the logs that it is stuck at the u-boot phase. Because we get the log “switch to partitions #0, OK”, which I see is in u-boot. Immediately after that there seem to be a lot of interrupts from the UART2 RX (pin 238), I receive logs like and then it tries to load some address but is unsuccessful, so it gets stuck there. Attaching the logs. uboot_stuck (3.1 KB)

hello jetson_user,

could you please have a try to remove all the content of extlinux.conf, for the booting process to load binaries via partitions.
the boot flow should reports as below…

 [0012.996] E> Nothing to parse in conf file
 [0012.996] I> Fallback: Load binaries from partition

Hi JerryChang,

I tried the above step. I first commented out everything within <top>/Linux_for_Tegra/bootloader/extlinux.conf. I also tried removing this file altogether, and then flashed the bootloader (I also tried building the bootloader for every change, but guess that is not needed??). Then I tried commenting out everything in the file: Linux_for_Tegra/rootfs/boot/extlinux/extlinux.conf, and flashing everything.

But for none of the cases, I got logs which you mention :(. Please find attached the bootloader log file with the new experiment. uboot_empty_extlinux_conf (2.5 KB)

I also tried making USE_UBOOT=0. This kind of led to a successful boot, but then it is constantly rebooting.

Just a reminder that everything works with the 100k resistor between pin 238 and ground, on a test board. Since our options are limited, we cannot add this new resistor to our already designed custom carrier boards. So we have to manage the “ignoring” of the interrupts caused by noise on pin 238 via uboot modification.

hello jetson_user,

may I also confirm which release version you’re working with.
could you please confirm it is constantly rebooting due to below errors?
for example,
Retry time exceeded; starting again
missing environment variable: pxeuuid