Xavier NX boot up failed in custom carrier board

Hi, Nvidia

I use nano developkit B01 to flash the xavier NX successfully and boot up successfully, however, when I move the NX to my own carrier board(Which is OK for nano), It cannot boot up, the power rails are OK. could you help me to check where is the problem ? thank you!

the log is:

[0000.024] W> RATCHET: MB1 binary ratchet value 4 is too large than ratchet level 2 from HW fuses.
[0000.033] I> MB1 (prd-version: 1.5.1.3-t194-41334769-d2a21c57)
[0000.038] I> Boot-mode: Coldboot
[0000.041] I> Chip revision : A02P
[0000.044] I> Bootrom patch version : 15 (correctly patched)
[0000.049] I> ATE fuse revision : 0x200
[0000.053] I> Ram repair fuse : 0x0
[0000.056] I> Ram Code : 0x0
[0000.059] I> rst_source : 0x0
[0000.061] I> rst_level : 0x0
[0000.065] I> Boot-device: QSPI
[0000.067] I> Qspi flash params source = brbct
[0000.072] I> Qspi using bpmp-dma
[0000.074] I> Qspi clock source : pllp
[0000.078] I> QSPI Flash Size = 32 MB
[0000.081] I> Qspi initialized successfully
[0000.085] W> No valid slot number is found in scratch register
[0000.091] W> Return default slot: _a
[0000.094] I> Active Boot chain : 0
[0000.097] I> Boot-device: QSPI
[0000.100] I> Qspi flash params source = brbct
[0000.106] W> MB1_PLATFORM_CONFIG: device prod data is empty in MB1 BCT.
[0000.114] I> Temperature = 40500
[0000.117] W> Skipping boost for clk: BPMP_CPU_NIC
[0000.121] W> Skipping boost for clk: BPMP_APB
[0000.125] W> Skipping boost for clk: AXI_CBB
[0000.129] W> Skipping boost for clk: AON_CPU_NIC
[0000.133] W> Skipping boost for clk: CAN1
[0000.137] W> Skipping boost for clk: CAN2
[0000.141] I> Boot-device: QSPI
[0000.144] I> Boot-device: QSPI
[0000.147] I> Qspi flash params source = mb1bct
[0000.151] I> Qspi using bpmp-dma
[0000.154] I> Qspi clock source : pllc_out0
[0000.158] I> Qspi reinitialized
[0000.161] I> Qspi flash params source = mb1bct
[0000.166] I> ECC region[0]: Start:0x0, End:0x0
[0000.170] I> ECC region[1]: Start:0x0, End:0x0
[0000.174] I> ECC region[2]: Start:0x0, End:0x0
[0000.178] I> ECC region[3]: Start:0x0, End:0x0
[0000.182] I> ECC region[4]: Start:0x0, End:0x0
[0000.187] I> Non-ECC region[0]: Start:0x80000000, End:0x100000000
[0000.192] I> Non-ECC region[1]: Start:0x0, End:0x0
[0000.197] I> Non-ECC region[2]: Start:0x0, End:0x0
[0000.201] I> Non-ECC region[3]: Start:0x0, End:0x0
[0000.206] I> Non-ECC region[4]: Start:0x0, End:0x0
[0000.211] E> FAILED: Thermal config
[0000.024] W> RATCHET: MB1 binary ratchet value 4 is too large than ratchet level 2 from HW fuses.
[0000.033] I> MB1 (prd-version: 1.5.1.3-t194-41334769-d2a21c57)
[0000.038] I> Boot-mode: Coldboot
[0000.041] I> Chip revision : A02P
[0000.044] I> Bootrom patch version : 15 (correctly patched)
[0000.049] I> ATE fuse revision : 0x200
[0000.053] I> Ram repair fuse : 0x0
[0000.056] I> Ram Code : 0x0
[0000.058] I> rst_source : 0xa
[0000.061] I> rst_level : 0x1
[0000.065] I> Boot-device: QSPI
[0000.067] I> Qspi flash params source = brbct
[0000.071] I> Qspi using bpmp-dma
[0000.074] I> Qspi clock source : pllp
[0000.078] I> QSPI Flash Size = 32 MB
[0000.081] I> Qspi initialized successfully
[0000.085] W> No valid slot number is found in scratch register
[0000.091] W> Return default slot: _a
[0000.094] I> Active Boot chain : 0
[0000.097] I> Boot-device: QSPI
[0000.100] I> Qspi flash params source = brbct
[0000.106] W> MB1_PLATFORM_CONFIG: device prod data is empty in MB1 BCT.
[0000.112] I> Temperature = 40500
[0000.115] W> Skipping boost for clk: BPMP_CPU_NIC
[0000.119] W> Skipping boost for clk: BPMP_APB
[0000.123] W> Skipping boost for clk: AXI_CBB
[0000.127] W> Skipping boost for clk: AON_CPU_NIC
[0000.132] W> Skipping boost for clk: CAN1
[0000.135] W> Skipping boost for clk: CAN2
[0000.140] I> Boot-device: QSPI
[0000.024] W> RATCHET: MB1 binary ratchet value 4 is too large than ratchet level 2 from HW fuses.
[0000.033] I> MB1 (prd-version: 1.5.1.3-t194-41334769-d2a21c57)
[0000.038] I> Boot-mode: Coldboot
[0000.041] I> Chip revision : A02P
[0000.044] I> Bootrom patch version : 15 (correctly patched)
[0000.049] I> ATE fuse revision : 0x200
[0000.053] I> Ram repair fuse : 0x0
[0000.056] I> Ram Code : 0x0
[0000.058] I> rst_source : 0xa
[0000.061] I> rst_level : 0x1
[0000.065] I> Boot-device: QSPI
[0000.067] I> Qspi flash params source = brbct
[0000.071] I> Qspi using bpmp-dma
[0000.074] I> Qspi clock source : pllp
[0000.078] I> QSPI Flash Size = 32 MB
[0000.081] I> Qspi initialized successfully
[0000.085] W> No valid slot number is found in scratch register
[0000.091] W> Return default slot: _a
[0000.094] I> Active Boot chain : 0
[0000.097] I> Boot-device: QSPI
[0000.100] I> Qspi flash params source = brbct
[0000.106] W> MB1_PLATFORM_CONFIG: device prod data is empty in MB1 BCT.
[0000.112] I> Temperature = 40000
[0000.115] W> Skipping boost for clk: BPMP_CPU_NIC
[0000.119] W> Skipping boost for clk: BPMP_APB
[0000.123] W> Skipping boost for clk: AXI_CBB
[0000.127] W> Skipping boost for clk: AON_CPU_NIC
[0000.132] W> Skipping boost for clk: CAN1
[0000.135] W> Skipping boost for clk: CAN2
[0000.140] I> Boot-device: QSPI
[0000.024] W> RATCHET: MB1 binary ratchet value 4 is too large than ratchet level 2 from HW fuses.
[0000.033] I> MB1 (prd-version: 1.5.1.3-t194-41334769-d2a21c57)
[0000.038] I> Boot-mode: Coldboot
[0000.041] I> Chip revision : A02P
[0000.044] I> Bootrom patch version : 15 (correctly patched)
[0000.049] I> ATE fuse revision : 0x200
[0000.053] I> Ram repair fuse : 0x0
[0000.056] I> Ram Code : 0x0
[0000.058] I> rst_source : 0xa
[0000.061] I> rst_level : 0x1
[0000.065] I> Boot-device: QSPI
[0000.067] I> Qspi flash params source = brbct
[0000.071] I> Qspi using bpmp-dma
[0000.074] I> Qspi clock source : pllp
[0000.078] I> QSPI Flash Size = 32 MB
[0000.081] I> Qspi initialized successfully
[0000.085] W> No valid slot number is found in scratch register
[0000.091] W> Return default slot: _a
[0000.094] I> Active Boot chain : 0
[0000.097] I> Boot-device: QSPI
[0000.100] I> Qspi flash params source = brbct
[0000.106] W> MB1_PLATFORM_CONFIG: device prod data is empty in MB1 BCT.
[0000.112] I> Temperature = 40500
[0000.115] W> Skipping boost for clk: BPMP_CPU_NIC
[0000.119] W> Skipping boost for clk: BPMP_APB
[0000.123] W> Skipping boost for clk: AXI_CBB
[0000.127] W> Skipping boost for clk: AON_CPU_NIC
[0000.132] W> Skipping boost for clk: CAN1
[0000.135] W> Skipping boost for clk: CAN2
[0000.140] I> Boot-device: QSPI

Hi, did you probe the power sequence? Is the power_en asserted correctly?

Yes, power_en asserted correctly and all power rails on carrier board which are controlled by power_en are OK.

Is this a full log info? Is the system stuck or power-off when this happen? What’s the power supply capability?

Yes, It’s full log,power supply capability is > 5V 8A,

the first time it‘s reset @
[0000.024] W> RATCHET: MB1 binary ratchet value 4 is too large than ratchet level 2 from HW fuses.
[0000.033] I> MB1 (prd-version: 1.5.1.3-t194-41334769-d2a21c57)
[0000.038] I> Boot-mode: Coldboot
[0000.041] I> Chip revision : A02P
[0000.044] I> Bootrom patch version : 15 (correctly patched)
[0000.049] I> ATE fuse revision : 0x200
[0000.053] I> Ram repair fuse : 0x0
[0000.056] I> Ram Code : 0x0
[0000.058] I> rst_source : 0x0
[0000.061] I> rst_level : 0x0
[0000.065] I> Boot-device: QSPI
[0000.067] I> Qspi flash params source = brbct
[0000.071] I> Qspi using bpmp-dma
[0000.074] I> Qspi clock source : pllp
[0000.078] I> QSPI Flash Size = 32 MB
[0000.081] I> Qspi initialized successfully
[0000.085] W> No valid slot number is found in scratch register
[0000.091] W> Return default slot: _a
[0000.094] I> Active Boot chain : 0
[0000.097] I> Boot-device: QSPI
[0000.100] I> Qspi flash params source = brbct
[0000.106] W> MB1_PLATFORM_CONFIG: device prod data is empty in MB1 BCT.
[0000.113] I> Temperature = 26500
[0000.116] W> Skipping boost for clk: BPMP_CPU_NIC
[0000.121] W> Skipping boost for clk: BPMP_APB
[0000.125] W> Skipping boost for clk: AXI_CBB
[0000.129] W> Skipping boost for clk: AON_CPU_NIC
[0000.133] W> Skipping boost for clk: CAN1
[0000.137] W> Skipping boost for clk: CAN2
[0000.141] I> Boot-device: QSPI
[0000.144] I> Boot-device: QSPI
[0000.147] I> Qspi flash params source = mb1bct
[0000.151] I> Qspi using bpmp-dma
[0000.154] I> Qspi clock source : pllc_out0
[0000.158] I> Qspi reinitialized
[0000.160] I> Qspi flash params source = mb1bct
[0000.166] I> ECC region[0]: Start:0x0, End:0x0
[0000.170] I> ECC region[1]: Start:0x0, End:0x0
[0000.174] I> ECC region[2]: Start:0x0, End:0x0
[0000.178] I> ECC region[3]: Start:0x0, End:0x0
[0000.182] I> ECC region[4]: Start:0x0, End:0x0
[0000.186] I> Non-ECC region[0]: Start:0x80000000, End:0x100000000
[0000.192] I> Non-ECC region[1]: Start:0x0, End:0x0
[0000.197] I> Non-ECC region[2]: Start:0x0, End:0x0
[0000.201] I> Non-ECC region[3]: Start:0x0, End:0x0
[0000.206] I> Non-ECC region[4]: Start:0x0, End:0x0
[0000.211] E> FAILED: Thermal config

and after that, every time it will reset @
[0000.024] W> RATCHET: MB1 binary ratchet value 4 is too large than ratchet level 2 from HW fuses.
[0000.033] I> MB1 (prd-version: 1.5.1.3-t194-41334769-d2a21c57)
[0000.038] I> Boot-mode: Coldboot
[0000.041] I> Chip revision : A02P
[0000.044] I> Bootrom patch version : 15 (correctly patched)
[0000.049] I> ATE fuse revision : 0x200
[0000.053] I> Ram repair fuse : 0x0
[0000.056] I> Ram Code : 0x0
[0000.058] I> rst_source : 0xa
[0000.061] I> rst_level : 0x1
[0000.065] I> Boot-device: QSPI
[0000.067] I> Qspi flash params source = brbct
[0000.071] I> Qspi using bpmp-dma
[0000.074] I> Qspi clock source : pllp
[0000.078] I> QSPI Flash Size = 32 MB
[0000.081] I> Qspi initialized successfully
[0000.085] W> No valid slot number is found in scratch register
[0000.091] W> Return default slot: _a
[0000.094] I> Active Boot chain : 0
[0000.097] I> Boot-device: QSPI
[0000.100] I> Qspi flash params source = brbct
[0000.106] W> MB1_PLATFORM_CONFIG: device prod data is empty in MB1 BCT.
[0000.112] I> Temperature = 26500
[0000.115] W> Skipping boost for clk: BPMP_CPU_NIC
[0000.119] W> Skipping boost for clk: BPMP_APB
[0000.123] W> Skipping boost for clk: AXI_CBB
[0000.127] W> Skipping boost for clk: AON_CPU_NIC
[0000.132] W> Skipping boost for clk: CAN1
[0000.135] W> Skipping boost for clk: CAN2
[0000.140] I> Boot-device: QSPI

And I found that, every time it reset :

[0000.049] I> ATE fuse revision : 0x200
[0000.053] I> Ram repair fuse : 0x0
[0000.056] I> Ram Code : 0x0
[0000.058] I> rst_source : 0xa
[0000.061] I> rst_level : 0x1

I do not know what’s the 0xa reset source is?

Hi,

I notice you use the term “reset”. Do you mean coldboot?

No, I did not proform anything after power up. NX reset by itself.

the first time NX reset by itself after

[0000.197] I> Non-ECC region[2]: Start:0x0, End:0x0
[0000.201] I> Non-ECC region[3]: Start:0x0, End:0x0
[0000.206] I> Non-ECC region[4]: Start:0x0, End:0x0
[0000.211] E> FAILED: Thermal config

and then, NX reset by itself after

[0000.123] W> Skipping boost for clk: AXI_CBB
[0000.127] W> Skipping boost for clk: AON_CPU_NIC
[0000.132] W> Skipping boost for clk: CAN1
[0000.135] W> Skipping boost for clk: CAN2
[0000.140] I> Boot-device: QSPI

after retry for a few time. it hangs.

during the entire retry, the PWR_EN, SYS_RST* is always high and 5V is always good.

Can you try more other power supply? It looks like a voltage/freq falling issue per the log info.