Jetson XavierNX bootloader MB1 fails on a custom board

Hi all,
We works with a custom board that have a HW close to the JetsonXavier NX DevKit carrier board.
I check the 5V power after power up (use a scope) - it looks OK, no drops. The 5V DCDC HW is exactly as for ref board.
The screenshot for boot the module from the serial console is following (it short, sorry):

[0000.024] W> RATCHET: MB1 binary ratchet value 4 is too large than ratchet level 2 from HW fuses.
[0000.033] I> MB1 (prd-version: 1.5.1.3-t194-41334769-d2a21c57)
[0000.038] I> Boot-mode: Coldboot
[0000.041] I> Chip revision : A02 
[0000.044] I> Bootrom patch version : 15 (correctly patched)
[0000.049] I> ATE fuse revision : 0x200
[0000.053] I> Ram repair fuse : 0x0
[0000.056] I> Ram Code : 0x0
[0000.059] I> rst_source : 0x0
[0000.061] I> rst_level : 0x0
[0000.065] I> Boot-device: QSPI
[0000.068] I> Qspi flash params source = brbct
[0000.072] I> Qspi using bpmp-dma
[0000.075] I> Qspi clock source : pllp
[0000.078] I> QSPI Flash Size = 32 MB
[0000.081] I> Qspi initialized successfully
[0000.085] W> No valid slot number is found in scratch register
[0000.091] W> Return default slot: _a
[0000.094] I> Active Boot chain : 0
[0000.097] I> Boot-device: QSPI
[0000.100] I> Qspi flash params source = brbct
[0000.106] W> MB1_PLATFORM_CONFIG: device prod data is empty in MB1 BCT.
[0000.114] I> Temperature = 22500
[0000.117] W> Skipping boost for clk: BPMP_CPU_NIC
[0000.121] W> Skipping boost for clk: BPMP_APB
[0000.125] W> Skipping boost for clk: AXI_CBB
[0000.129] W> Skipping boost for clk: AON_CPU_NIC
[0000.133] W> Skipping boost for clk: CAN1
[0000.137] W> Skipping boost for clk: CAN2
[0000.141] I> Boot-device: QSPI
[0000.144] I> Boot-device: QSPI
[0000.147] I> Qspi flash params source = mb1bct
[0000.151] I> Qspi using bpmp-dma
[0000.154] I> Qspi clock source : pllc_out0
[0000.158] I> Qspi reinitialized
[0000.161] I> Qspi flash params source = mb1bct
[0000.166] I> ECC region[0]: Start:0x0, End:0x0
[0000.170] I> ECC region[1]: Start:0x0, End:0x0
[0000.174] I> ECC region[2]: Start:0x0, End:0x0
[0000.178] I> ECC region[3]: Start:0x0, End:0x0
[0000.182] I> ECC region[4]: Start:0x0, End:0x0
[0000.187] I> Non-ECC region[0]: Start:0x80000000, End:0x100000000
[0000.192] I> Non-ECC region[1]: Start:0x0, End:0x0
[0000.197] I> Non-ECC region[2]: Start:0x0, End:0x0
[0000.201] I> Non-ECC region[3]: Start:0x0, End:0x0
[0000.206] I> Non-ECC region[4]: Start:0x0, End:0x0
[0000.211] E> FAILED: Thermal config

This output is exactly the same with output for the same module on the ref board. But for the customs board it doesn’t show up the following two lines:

[0000.218] E> FAILED: MEMIO rail config
[0000.229] I> Boot-device: QSPI

So, it seems, the MB1 try select the boot device for the next bootloader, MB2, and fails.
Can somebody give an advise about reasons why MB1 can fails at this point?
Thank you.

Edit. I need check the mb1 BCT configuration, seems, it will shows a reason for problem.

hello dmitriy.antonets,

you may access NX Product Design Guide for the recommendations and guidelines to have board customization.
you should also download pinmux spreadsheets, and please refer to Pinmux Changes session for customize and change the pinmux configuration applied by the software.
thanks

Thanks. The ours HW team going down schematic and bringup checklists. Regarding pinmux spreadsheets - I intensively use it for works with FW for the Jeson NX dev board.

Hi Jerry, Ours HW team finish with spreadsheet, it no a negative results, everything looks reasonable, mean OK. So, we try use recovery mode, and load QSPI flash and SDcard but it fails exactly the same placeSDKM_logs_JetPack_4.4_Linux_for_Jetson_Xavier_NX_2020-09-16_16-48-36.zip (99.5 KB) console.rtf (4.1 KB) .I attach logs from the SDK Manager and debug console. Can you take a look on it and, I hope, give me an idea what it can be …,

hello dmitriy.antonets,

here’s errors reported during flash NX platform.

$ sudo ./flash.sh jetson-xavier-nx-devkit-emmc mmcblk0p1
...
[   3.1925 ] Boot Rom communication
[   3.1955 ] tegrarcm_v2 --chip 0x19 0 --rcm rcm_list_signed.xml
[   3.1981 ] BR_CID: 0x88021911647076c21400000018008140
[   3.1988 ] RCM version 0X190001
[   3.2145 ] Boot Rom communication completed
[   4.2302 ] 
[   5.2348 ] tegrarcm_v2 --isapplet
[   5.2379 ] Applet version 01.00.0000
[   5.2565 ] 
[   5.2566 ] Sending BCTs
[   5.2594 ] tegrarcm_v2 --download bct_bootrom br_bct_BR.bct --download bct_mb1 mb1_bct_MB1_sigheader.bct.encrypt --download bct_mem mem_rcm_sigheader.bct.encrypt
[   5.2621 ] Applet version 01.00.0000
[   5.3051 ] Sending bct_bootrom
[   5.3056 ] [................................................] 100%
[   5.3065 ] Sending bct_mb1
[   5.3110 ] [................................................] 100%
[   5.3153 ] Sending bct_mem
[   5.3655 ] [................................................] 100%
[   5.4227 ] 
Error: Return value 8
Command tegrarcm_v2 --download bct_bootrom br_bct_BR.bct --download bct_mb1 mb1_bct_MB1_sigheader.bct.encrypt --download bct_mem mem_rcm_sigheader.bct.encrypt
Failed flashing t186ref.

since you had your own board customization, you might using the flash scripts to flash the board instead of using SDKManager.
you should also using correct board configuration files to flash the board, for example, above messages were using jetson-xavier-nx-devkit-emmc.conf configuration file.
thanks

1 Like

Hi Jerry,
Yes, this is an error what I got every time, and it doesn’t mere what I use - devkit (QSPI), or prod (eMMC). Anyway, I attach two files:

On the video you can see that SDK manager show an error at the same moment (instantly) as MB1’s code on the module reach a point after print the “E> FAILED: Thermal config” string. It means (IMHO):

  • the module’s CPU hangs to reset, or to a state like that at this moment;
  • the USB0 HW going to an initial/reset state as well, and goes out from the USB device mode;
  • SDK Manager recognized “lost connection with a USB device” event and prints an error message;

The same module with the same FW works OK on the NVIDIA’s carrier board.
So, we did the following:

  • remove/disable all peripheral devices and USB hubs on ours carrier board from USB, I2C, and I2S buses;
  • all PCI/M.2, CSI, HDMI, and DP slots is empty, board haven’t Ethernet;
  • measure power signals - it OK, no “spikes”, no reset signals from module as well;

The custom carrier board still have the same problem.
We did the same with NVIDIA’s carrier board (remove parts) - the module boot OK.

The question is: what MB1 code does (in terms of “functions”, I’m not need a peace of code, probably) after prints to the serial port the string “E> FAILED: Thermal config” but before print string “E> FAILED: MEMIO rail config”? It must be a simple check/setup some registers, I think.
Thanks.

1 Like

hello dmitriy.antonets,

since you’d customize and change the pinmux configuration applied by the software. please refer to MB1 Configuration Changes to have updated board configuration files.

please don’t using SDKManager to flash your customize board.
instead, please enter the path, and replacing board configuration files to flash your board
for example, ~/nvidia/nvidia_sdk/JetPack_4.4_Linux_JETSON_AGX_XAVIER/Linux_for_Tegra/

there’s flash script, you may also check Basic Flash Script Usage for details.
thanks

It seems important to know what MB1 is doing at the point where we see the debug output E> FAILED: Thermal config”, because on the Nvidia reference carrier board, it continues with “E> FAILED: MEMIO rail config”, but on our board it hangs. Can you release source code for MB1 or pseudocode or give us some idea what it is doing in this part of the code. We have now spent weeks debugging this and we’re getting nowhere.

hello jefferybahr,

we don’t public release MB1 sources, you may access download center to check L4T sources.

so, may I have your confirmation.
had you refer to documentation to have MB1 configuration changes and also update board configuration files for flashing the board?
thanks

Thanks, Jerry. BTW, Dmitriy and I are both working on the port of software/firmware for a new carrier board for the Xavier NX. We have been working with the module from the NVIDIA Jetson Xavier NX Developer Kit, which is sitting on our new carrier board. This carrier board was designed after reviewing the schematics and board layout of the Developer Kit carrier board. It doesn’t not differ in any significant way that we know, but does it contains additional components. We can install firmware using the SDK Manager or flash.sh using the Developer Kit carrier board and everything works fine. If we substitute our carrier board, we get a failure somewhere in the execution of MB1.

We have ordered Jetson Xavier NX Module ( 900-83668-0000-000) and expect it soon. In the meanwhile, we have been using the SOM from the Developer Kit and our custom carrier board. Is it possible that there are differences between the two SOMs that would result in the failure we’re seeing? In other words, is it possible that the standalone SOM (900-83668-0000-000) will work with our custom carrier board, but the SOM from the Development Kit will not?

Thanks and regards,
Jeff and Dima

Hi Jerry,
We got the NX module, and prepare the separated directory for the files. I copy required files, and add some debugging echo-s commands to the flash.sh script for showing what files using.
Results is the same - MB1 hangs at the same point. I upload the ZIP file contains the log files from Linux side, and from the module’s debug serial port. FlashModuleLogs.zip (7.4 KB)
We need a suggestion what a BCT files related with this case (MB1 fails)?

There are following questions about files:

  1. file p3668.conf.common have the following fragment:
# Process fuse version:
#
# Non-fused BS vs Production Fused BD
#
# mts_c10_dev_cr.bin     vs. mts_c10_prod_cr.bin
# mce_c10_dev_cr.bin     vs. mce_c10_prod_cr.bin
# mb1_t194_dev.bin       vs. mb1_t194_prod.bin
# warmboot_t194_dev.bin  vs. warmboot_t194_prod.bin

and

MB1FILE=“bootloader/mb1_t194_dev.bin”;

but file mb1_t194_dev.bin not exists in L4T, and other dev files doesn’t exists in the L4T as well. So, the question - where is this files?

  1. Another line from this file is :

FBFILE=“fuse_bypass_t194.xml”;

where is file fuse_bypass_t194.xml?

Thanks.

P.S. I use dev kit’s carrier board with the production module and SDK Manager - it works OK. So, I repeat a the question about the what BCT file related with MB1 fails?
Thanks.
P.P.S. I use the dev kit’s carrier board with production module, and my files tree for the ours custom board - it works OK. So, the same question again - what BCT file(s) related with MB1 hangs after print the line “E> FAILED: Thermal config”?

We start Linux on ours custom board.
Thank you.

@dantonets:

May I to know what your change could fix the “E> FAILED: Thermal config” issue?

Hi Sammy,
In the first, I use the “work around”, not a fix: disable I2C bus to the PMIC in the BCT pinmux file, then enable it back in the CBoot.
But it was need for old SDK Manager version. For latest versions it, seems, not need.

I got it.
Thank you for your sharing.