Jetson Nano production module does not boot on custom carrier board, but does so on auvidea's

Hi WayneWWW,

I still do not have a mechanism to retrieve the serial logs and I understand it is difficult to help out without that.

I still have a question: Is there a difference in the way the SDK manager flashes the modules and the flash.sh script does (./flash.sh jetson-nano-emmc mmcblk0p1) ?

Hi jetson_user,

Sorry for late reply.

I still have a question: Is there a difference in the way the SDK manager flashes the modules and the flash.sh script does (./flash.sh jetson-nano-emmc mmcblk0p1) ?

Basically, they are same. I am not sure whether sdkmanger would download the package every time or not. If it does not, then they are totally same.

Hi, this looks like PCB quality issue of some boards. Did you check and compare the components placement and soldering quality related things between issue board and good board?

Hi,

Thank you for the responses.

Hi Trumany,

I do not know how to exactly check these, but the same carrier boards with the issue, can boot other (fresh Nanos) that are flashed with a clean image (i.e., not the image which resulted in a boot issue).

Hi jetson_user,

I think you better clarifying your case again since I cannot get what the exact scenario that would hit error.

We would like to know whether it is a hardware design problem on carrier board or software driver issue.

Please tell us

From #11

I do not know how to exactly check these, but the same carrier boards with the issue, can boot other (fresh Nanos) that are flashed with a clean image

Wayne: Do you mean this issue only happens to your customized image? It does not matter about the carrier board?

From #7

When I flash using the flash script, the problem I mentioned happens, but only on a few units. It seems there seems to be a hardware dependency on this, as if some Nano module is not suited for flashing by flash script, and when our custom carrier board is used, though I have not been able to crack this relation yet.

Wayne: It seems issue is related to carrier board again. Which one is true here?

Hi WayneWWW,

I understand the issue is confusing and even we are baffled. We do not know at this point for certain whether it is a hardware issue or a software one, or a combination.

Wayne: Do you mean this issue only happens to your customized image? It does not matter about the carrier board?

Yes, the issue happens only with our customized image, and when our carrier board is used to flash the Nano. Suppose the Nano is flashed like this and does not boot on our carrier boards, and if we use the same Nano which was not booting on our carrier board on Auvidea’s carrier board, it boots, but we do not intend to use Auvidea carrier boards in field at least as of now.
When I say ‘clean’ image, I mean to say I remove the system.img and system.img.raw files, and run the flash scripts so it generates the ‘new/clean’ image. But each time, it is with our customized kernel and device tree.

Wayne: It seems issue is related to carrier board again. Which one is true here?

As discussed above, we do not know for certain if it is the carrier board issue. It could be that something goes wrong with the electronics while flashing, rendering the Nanos non-bootable (and also I was wondering if you could help with indicating if I could debug something into the BSP). The main problem is that if I encounter such a situation when a Nano does not boot on our carrier boards (after flashing on our carrier boards) it is not recoverable, which means no matter how many times I flash it and with a ‘clean’ image as discussed above, or even re-compiling the entire kernel+device tree before, the Nano does not boot on our carrier boards but does so on Auvidea’s.

This is the “relation” I was refering to in #12 - somehow only some Nanos are not suitable for our carrier boards (we always use the same image), and cannot be made so.

There are multiple such units till now. So what we observe is if we flash a unit on our carrier boards which somehow makes it non-bootable on our carrier board, then the Nano is non-reclaimable as I explained, and also we have to allow the flash script to generate a new system.img* (by not passing the ‘-r’ option) and deleting the old system.img*, when we move on to flashing the fresh Nano unit. Otherwise we create another non-reclaimable unit.

So, our focus at the moment is to somehow reclaim those units, and prevent such an occurrence in the future. What we now do is create a new system.img* each time for flashing a Nano, and deleting the old image once it is done. This process is slow, but somehow safe till now. We have to flash a large number of Nanos and possibly in an automated way in the future (but right now this isn’t the problem I intend to put forth). Basically, we somehow have to prevent producing a non-bootable Nano because there are high chances that if we hit one such unit, we create other non-reclaimable units.

Let me know if this information helps further.

Hi jetson_user,

I got it. So clean image indicates a new systemimage created without “-r” parameters in flash.sh.

somehow only some Nanos are not suitable for our carrier boards (we always use the same image)
Do you mean there are some nano that can flash and boot up successfully on your carrier board?

To be honest, if this commit is true, I don’t think this issue could be resolved from software especially under a case that doesn’t have uart console log.

Do you mean there are some nano that can flash and boot up successfully on your carrier board?

All Nanos can be flashed by our carrier boards. Some Nanos (the number is significant, like 6 in 50) just don’t boot after flashing on our carrier boards. But most of them do so successfully. We need some way to re-claim the non-bootable ones as they boot on auvidea boards but not on our boards. So the idea is to look into what’s going wrong with flashing and ensuring creation of an image that results in successful boot each time on our carrier boards, and also checking if hardware can be an issue.

And we use our same custom image each time.

Yes, I understand that without console logs it is difficult. Meanwhile we are looking into obtaining those as well.

So the idea is to look into what’s going wrong with flashing and ensuring creation of an image that >results in successful boot each time on our carrier boards, and also checking if hardware can be an >issue.

I don’t think checking the flash process would help for them. The flash process would only read the eeprom value from each module and if eeprom value is not valid, the flash process would fail. It won’t not affect the systemimage.

Hi WayneWWW,

I think we are facing a similar problem. We have an custom Board as well. It works fine with the development Module but does not boot with the Production Module. The Production Module we used does boot on a carrier board from Auvidea.

Please take a look at the UART output:

[0000.321] [L4T TegraBoot] (version 00.00.2018.01-l4t-80a468da)
[0000.327] Processing in cold boot mode Bootloader 2
[0000.331] A02 Bootrom Patch rev = 1023
[0000.335] Power-up reason: pmc por
[0000.338] No Battery Present
[0000.341] pmic max77620 reset reason
[0000.344] pmic max77620 NVERC : 0x40
[0000.347] RamCode = 0
[0000.350] Platform has DDR4 type RAM
[0000.353] max77620 disabling SD1 Remote Sense
[0000.357] Setting DDR voltage to 1125mv
[0000.361] Serial Number of Pmic Max77663: 0xa12ca
[0000.369] Entering ramdump check
[0000.372] Get RamDumpCarveOut = 0x0
[0000.375] RamDumpCarveOut=0x0,  RamDumperFlag=0xe59ff3f8
[0000.380] Last reboot was clean, booting normally!
[0000.385] Sdram initialization is successful 
[0000.389] SecureOs Carveout Base=0x00000000ff800000 Size=0x00800000
[0000.395] Lp0 Carveout Base=0x00000000ff780000 Size=0x00001000
[0000.401] BpmpFw Carveout Base=0x00000000ff700000 Size=0x00080000
[0000.407] GSC1 Carveout Base=0x00000000ff600000 Size=0x00100000
[0000.413] GSC2 Carveout Base=0x00000000ff500000 Size=0x00100000
[0000.418] GSC4 Carveout Base=0x00000000ff400000 Size=0x00100000
[0000.424] GSC5 Carveout Base=0x00000000ff300000 Size=0x00100000
[0000.430] GSC3 Carveout Base=0x000000017f300000 Size=0x00d00000
[0000.446] RamDump Carveout Base=0x00000000ff280000 Size=0x00080000
[0000.452] Platform-DebugCarveout: 0
[0000.456] Nck Carveout Base=0x00000000ff080000 Size=0x00200000
[0000.461] Non secure mode, and RB not enabled.
[0000.478] Csd NumOfBlocks=0

I think the Problem is the last line because nothing was found:
[0000.478] Csd NumOfBlocks=0

UART Output of development Module that boots normal:

[0000.125] [L4T TegraBoot] (version 00.00.2018.01-l4t-80a468da)
[0000.131] Processing in cold boot mode Bootloader 2
[0000.135] A02 Bootrom Patch rev = 1023
[0000.139] Power-up reason: pmc por
[0000.142] No Battery Present
[0000.145] pmic max77620 reset reason
[0000.148] pmic max77620 NVERC : 0x40
[0000.151] RamCode = 0
[0000.154] Platform has DDR4 type RAM
[0000.157] max77620 disabling SD1 Remote Sense
[0000.161] Setting DDR voltage to 1125mv
[0000.165] Serial Number of Pmic Max77663: 0x2b31ed
[0000.173] Entering ramdump check
[0000.176] Get RamDumpCarveOut = 0x0
[0000.179] RamDumpCarveOut=0x0,  RamDumperFlag=0xe59ff3f8
[0000.184] Last reboot was clean, booting normally!
[0000.189] Sdram initialization is successful 
[0000.193] SecureOs Carveout Base=0x00000000ff800000 Size=0x00800000
[0000.199] Lp0 Carveout Base=0x00000000ff780000 Size=0x00001000
[0000.205] BpmpFw Carveout Base=0x00000000ff700000 Size=0x00080000
[0000.211] GSC1 Carveout Base=0x00000000ff600000 Size=0x00100000
[0000.216] GSC2 Carveout Base=0x00000000ff500000 Size=0x00100000
[0000.222] GSC4 Carveout Base=0x00000000ff400000 Size=0x00100000
[0000.228] GSC5 Carveout Base=0x00000000ff300000 Size=0x00100000
[0000.234] GSC3 Carveout Base=0x000000017f300000 Size=0x00d00000
[0000.250] RamDump Carveout Base=0x00000000ff280000 Size=0x00080000
[0000.256] Platform-DebugCarveout: 0
[0000.259] Nck Carveout Base=0x00000000ff080000 Size=0x00200000
[0000.265] Non secure mode, and RB not enabled.
[0000.270] Read GPT from (4:0)
[0000.398] Csd NumOfBlocks=62333952
[0000.403] Set High speed to 1

Do you have any Idea why this happens?

Could you file a new topic or describe your setup?

And could you share what kind of software is installed on the module?

Hi WayneWWW,

I was also working on the same, but am unable to fetch UART logs so far. So it would be helpful to track any related posts, by members who have the UART logs, for debugging our issue. Could I get a link to the new post here, if it is put up by reifenrath.michel.

Hi, here is the Link to the new topic:

Hi WayneWWW,

We finally managed to get the UART logs, for our custom carrier boards. Please find attached the logs for the booting and non-booting case.
UARTLogsNanoBooting (18.6 KB)
UARTLogsNanoNotBooting (18.9 KB)

It seems that for our non-booting case, we have a peculiar log:
We get a prompt for:

“Tegra210 (P3450-Porg) # MC: “

instead of

“switch to partitions #0”.

One more difference is that vdd_core voltage is set to 1125 mV in case of the booting Nano, but 1075 mV in case of the non-booting one. Is this something to worry about? Although for the non-booting case, we have also seen 1125mV being set as the vdd_core voltage, in some other logs that we have.

Can you please advice regarding this?

Hi WayneWWW,

Do you have an update on this yet?

hello jetson_user,

it looks you stuck at u-boot stage due to bad device.

U-Boot 2016.07-gd917e08cec (Jul 16 2019 - 16:52:59 -0700)

TEGRA210
Model: NVIDIA P3450-Porg
Board: NVIDIA P3450-PORG
DRAM:  4 GiB
MMC:   Tegra SD/MMC: 0, Tegra SD/MMC: 1
Using default environment

In:    serial
Out:   serial
Err:   serial
Net:   No ethernet found.
MMC: no card present
** Bad device mmc 1 **

could you please have summarize what’s the modification you had done,
since it’s a custom carrier board, did you fully check schematic to review the board design? had you also done pinmux customization.
in addition, what’s the release image you’re now using.

Hi JerryChang,

Did you mean the changes related to software or hardware? For software, I tried with Jetpack 4.2.1, and changed the command line argument, like adding console=ttyTHS1 and early_printk, as can be followed in our previous discussion: How to obtain UART logs on Jetson Nano production modules, over UART 2 - #10 by jetson_user. We also tried with Jetpack 4.6 and 4.5.1, and still the booting is stuck, for some of the Nanos. Of course, at this moment we are able to get the UART logs. And we have resumed with Jetpack 4.2.1, because our drivers are based on that.

For hardware, I will have to get back to you with the details.

The bad device message also appears for the booting case, in the logs: UARTLogsNanoBooting. So might this still be the issue?

Hi JerryChang,

We finally have the root cause known for (some of) the Nanos not being able to boot on our custom carrier boards. It seems there was noise on pin 238 (UART2-RX) which we had left unconnected. This caused the uboot to detect a BREAK signal, apparently. On applying a pull-down resistor on this pin, the Nanos are booting.

We need to change the circuit of course, but we were wondering if there was some easy way to ignore the detection of this signal in the u-boot source code, for now? We are using L4T_R32-2_public_sources. Also could you please let us know a good way to distribute this new u-boot binary in the field Nanos? Do we need to use the encryption method as we do for signing device tree binaries and updating some partition? If this is so, could we have details on that please?

Eagerly waiting for your reply. Thanks!

Hi JerryChang,

Do you by chance have an update on this issue yet?

hello jetson_user,

sorry for late reply, I was missing this thread.
there’s u-boot to detect everything to stop booting-up. you should have resistor on this pin to avoid the detected noise.
for an alternative way, you should try disable abortboot() function or having hard-code bootdelay to skip detecting input keys.

furthermore, Nano’s u-boot binary is flashing to the LNX partition. are you able to perform flash script to update u-boot binary?