The Nvidia AGX Thor system is unable to start

The Nvidia AGX Thor system is unable to start.

Software Version
DRIVE OS 7.0.3

Target Operating System
Linux

Hardware Platform
self-developed carrier board and thor modules

Host Machine Version
Compiled the official SDK based on Ubuntu 22.04

Issue Description
Hello, when I tried to start Thor using the self-developed carrier board, the system failed to boot. Could you please help analyze the possible reasons and solutions? Thank you!

Logs


[0000.097] I> MB1 (version: 0.23.0.1-t264-75019003-35ec65c7)
[0000.097] C> Boot-mode : Coldboot
[0000.097] C> MB1 last_boot_error: 0x0
[0000.097] I> Entry timestamp: 0x00013fb0
[0000.099] C> rst_source: 0x0, rst_level: 0x0
[0000.103] I> BR-BCT: preprod_dev_sign: 0
[0000.107] I> Socket mask: 0x1
[0000.110] I> Socket id: 0
[0000.112] I> Chip supports UFS HS mode
[0000.116] I> BR last_boot_error0: 0x0
[0000.119] I> BR last_boot_error1: 0x0
[0000.123] I> BR last_boot_error2: 0x0
[0000.126] I> NVBCT is initialized
[0000.130] I> Task: PSC Mailbox init
[0000.133] I> Task: Bootchain select WAR set
[0000.137] I> Task: Bootchain update marker
[0000.141] I> Task: Send error information to MCU
[0000.145] I> Task: CRC check
[0000.149] I> Task: CRC Integrity check
[0000.152] I> Task: Bootrom patch record validation and params check
[0000.158] I> Task: CRC Integrity check
[0000.161] I> Task: SMN Transfer check
[0000.165] I> Task: Initialize UTMI PLL
[0000.168] I> Task: Initialize REF_UFS PLL
[0000.173] I> Task: Crypto init
[0000.175] I> Task: Boost clocks
[0000.178] I> tegrabl_clk_print_clock_freq not supported
[0000.183] I> mb1_clocks_soc_boost_clocks: Switch BPMP_CPU_NIC to NAFLL
[0000.190] I> mb1_clocks_soc_boost_clocks: CLK_SOURCE_AXI_CBB 0x0, bct.axi_cbb_clk_divisor 0
[0000.198] I> Task: Perform MB1 KAT tests
[0000.202] I> Task: Secure debug controls
[0000.205] I> Task: Program NV master stream id
[0000.210] I> Task: Enabling and initialization of MSS Bandwidth limiter
[0000.216] I> No request to configure MBWT settings for any PC!
[0000.222] I> Task: Verify boot mode
[0000.225] I> Task: Alias fuses
[0000.228] I> FUSE_ALIAS: Fuse alias on production fused part is not supported.
[0000.235] I> Task: Print SKU type
[0000.238] I> SKU: Floorswept F1
[0000.241] I> Task: Recalibrate SOC vmon
[0000.245] I> vmon adc_cal_revision 4,  min_adc_fuse_rev 1
[0000.250] I> SOC VMON: min_reset 0, max_reset 0
[0000.254] I> SOC VMON ADC_SETUP: 0x00000000 to 0xc0000000
[0000.259] I> SOC VMON: Vmon re-calibration and fine tuning done
[0000.265] I> SOC VMON: min_reset 0, max_reset 0
[0000.270] I> Task: Select UFS ref clk
[0000.273] I> UFS device reference clock programmed to 26 MHz
[0000.279] I> Task: PMC AUX Clamp
[0000.282] I> Task: UPHY lane resets
[0000.285] I> Task: UPHY init
[0000.288] I> Configuring uphy in rate B
[0000.293] I> tegrabl_clk_print_clock_freq not supported
[0000.297] I> PLL3 ownership is assigned to PCIE
[0000.302] I> tegrabl_clk_print_clock_freq not supported
[0000.308] I> mphy_nv_calib_rev : 0, mphy_nv_calib_aux_val : 0
[0000.312] I> Enabling PLL with id : 3
[0000.315] I> Enabling PLL with id : 4
[0000.320] I> Skipping uphy_instance 1. Not supported yet
[0000.324] I> Skipping uphy_instance 2. Not supported yet
[0000.329] I> Task: Boot device init
[0000.332] I> Boot_device: QSPI_FLASH instance: 0
[0000.337] I> Using QSPI flash params from bct!!
[0000.341] I> PLLC is already locked
[0000.344] I> Qspi clock source : pllc_out0
[0000.348] I> QSPI Flash: Macronix 64MB at index: 5
[0000.354] I> QSPI-0l initialized successfully
[0000.357] I> Task: TSC init
[0000.360] I> Task: Load membct
[0000.363] C> RAM_CODE 0xc
[0000.365] I> Loading MEMBCT
[0000.368] I> Slot: 0
[0000.370] I> Binary[6] block-3840 (partition size: 0x60000)
[0000.375] I> Binary name: MEM-BCT-6
[0000.378] I> Size of crypto header is 8192
[0000.382] I> Size of crypto header is 8192
[0000.387] I> BCH of MEM-BCT-6 read from storage
[0000.391] I> MEM-BCT-6 header integrity check is success
[0000.396] I> Binary magic in BCH component 6 is MEM6
[0000.401] I> component binary type is 6
[0000.405] I> MEM-BCT-6 binary is read from storage
[0000.410] I> MEM-BCT-6 binary integrity check is success
[0000.414] C> RAM_CODE 0xc
[0000.419] C> RAM_CODE 0xc
[0000.420] I> Task: Load Page retirement list
[0000.423] I> DRAM ECC PRL is disabled
[0000.426] I> Task: SDRAM params override
[0000.430] I> Before update: MssEncryptGenKeys: 0x20000000, MssEncryptDistKeys: 0x00000000, McDefaultEncr: 0x00000000
[0000.441] I> Post update: MssEncryptGenKeys: 0x2000000f, MssEncryptDistKeys: 0x000f000f, McDefaultEncr: 0x80000100
[0000.451] I> Task: Save mem-bct info
[0000.455] I> Task: Carveout allocate
[0000.458] I> GR carveout will not be allocated
[0000.462] I> RCM blob carveout will not be allocated
[0000.481] I> Task: Enable WDT 5th expiry
[0000.481] I> Re-Enabling WDT
[0000.481] I> Task: I2C register
[0000.481] I> Task: Fuse Dump
[0000.482] I> opt_ucf_cluster_disable = 0x0
[0000.483] I> opt_c2c_disable = 0x0
[0000.487] I> opt_gpc_disable = 0x0
[0000.490] I> opt_tpc_disable = 0x8
[0000.493] I> opt_fbp_disable = 0x0
[0000.496] I> opt_nvdec_disable = 0x0
[0000.500] I> opt_nvenc_disable = 0x0
[0000.503] I> opt_isp_disable = 0x0
[0000.506] I> opt_mgbe_disable = 0x0
[0000.510] I> opt_fsi_disable = 0x0
[0000.513] I> fsi_ram_repair_enable = 0x1
[0000.517] I> opt_mss_disable = 0x0
[0000.520] I> OPT_IGPU_RESERVED = 0x165
[0000.524] C> BOOT_NV_INFO = 0x30f788
[0000.527] I> BOOTROM_PATCH_VERSION = 0x3
[0000.531] I> PSCROM_PATCH_VERSION = 0x7
[0000.534] I> OPT_ADC_CAL_FUSE_REV = 0x4
[0000.538] I> OPT_SW_L1MAINRST_ENABLE = 0x1
[0000.542] I> OPT_EMPD_CALIB_REV = 0x2
[0000.545] I> OPT_EMPD_CALIB = 0x784
[0000.549] I> OPT_CSDC_ENABLE = 0x0
[0000.552] I> OPT_PV_DAC_CTRL_REV = 0x3
[0000.555] I> OPT_PV_BG_CTRL_DAC_OFFSETS_REV = 0x3
[0000.560] I> opt_ECC_en = 0x1
[0000.563] I> SKU_INFO = 0xa0 (PROD)
[0000.566] C> OPT_SAMPLE_TYPE = 0x3
[0000.569] C> opt_subrevision = 0x2
[0000.572] I> opt_lot_code_0 = 0xf7891c5 (rdl)
[0000.577] I> opt_ft_rev = 0x40
[0000.580] I> opt_fuse_file_version = v55
[0000.583] I> slt_info = 0x3
[0000.586] I> cpu_speedo_calib = 0x86c
[0000.590] I> cpu_ht_iddq_calib = 0x69
[0000.593] I> soc_speedo_calib = 0x86f
[0000.597] I> soc_ht_iddq_calib = 0x63
[0000.600] I> gpu_speedo_calib = 0x85b
[0000.604] I> gpu_ht_iddq_calib = 0x14c
[0000.607] I> mss_speedo_calib = 0x85f
[0000.611] I> mss_ht_iddq_calib = 0x176c
[0000.614] I> Task: POD config init
[0000.618] I> Task: Reset FSI
[0000.620] I> Task: Update FSI scratch with fuse data
[0000.625] I> Task: Load and update platform data
[0000.629] I> Sku value zero. Using platform data in MB1 BCT
[0000.635] I> Task: Release membct carveout
[0000.639] I> Task: Pinmux init
[0000.642] I> Task: Prod config init
[0000.646] W> PROD_CONFIG: controller prod table is empty in MB1 BCT.
[0000.651] I> Task: Pad voltage init
[0000.655] I> Task: Prod init
[0000.6

Are you sure the flash process of your board really working?

Currently, the burning process of this board is also abnormal. It can enter the burning mode normally, but the burning fails. Please help analyze the possible factors causing this situation and provide a direction for further troubleshooting.

We made some changes:
1: Modify the EEPROM.
2: After flashing the firmware on the development board, it can start normally.
3: If this module is directly placed on the ORIN baseboard, it can also start and enter the UEFI shell.
4: If this module is placed on the self-developed baseboard of Thor, the startup is abnormal. How to troubleshoot this problem? What are the hardware abnormalities that could cause such an issue?

Hi,

麻煩提供燒錄時候的log. 你給的東西已經是在做cold boot了

The initial log you received was successfully burned based on the development kit. After the system was successfully started, I obtained the log of my own board startup.

Then what is the result if you put the module back to NV devkit now? Will it boot up fine?

Put this current state of the module into the development kit, and the system can start normally.

please use your board to flash and dump out the log.

Above cross check already indicates your board seems having hardware issue.

OK, I will store and upload the logs during the burning process.

Hello, output.log is the log output by the terminal during the burning process, while debug.txt is the log output by the debug serial port.

output.log (254.4 KB)

debug.txt (21.4 KB)

No obvious software error from the logs. Please review the hardware design with the design guide.

Hello, could you please inform me approximately which factors might affect the system startup?

Hello, I would like to add that on my own board, the Orin module can be installed and the system can be booted, but the Thor module cannot be started.