Thor - ERROR: MCU_ERRHANDLER: SOC error pin is asserted

DRIVE OS Version: Provide DRIVE OS version. Example: 7.2.3

Issue Description: Thor is no more booting up correctly. Display is dark.

booting via connected minicom gives this log:

NvShell>mcureset
Info: Executing cmd: mcureset, argc: 0, args:
NvShell>INFO: MCU_PLTFPWRMGR: Reseting
INFO: MCU_SYSSTATEMNGR: State update notification received- 3
INFO: MCU_PLTFPWRMGR: MPFDTI Timer is stopped
INFO: MCU_PLTFPWRMGR: VRS11 PG Monitoring disable.
INFO: NVMCU_SOCPWRCTRL: Wait for Safe Shutdown notification (20s max)
Check for VRS10…
Check for VRS10…
Check for VRS10…
INFO: NVMCU_SOCPWRCTRL: VRS10/11 BIST configuration done!
MCU_FOH: SPI : E2E_P05Check Status : 7 : 0
ERROR: MCU_ERRHANDLER: McuFoh : ReporterID - 0x810E ErrorCode - 0x3
MCU_FOH: SPI : E2E_P05Check Status : 7 : 0
MCU_FOH: SPI : E2E_P05Check Status : 7 : 0
I:RptrID - 0x810B PsCd - 0x8 - VMON Seq Record
I:RptrID - 0x810B PsCd - 0x7 - VMON Seq Match
I:RptrID - 0x810B PsCd - 0x23 - VMON Seq Ackld
INFO: PLTFPWRMGR_ETHCTRL: Ethernet peripherals de-initialized
INFO: MCU_PLTFPWRMGR: Request Eth deinitialization done !
INFO: PLTFPWRMGR_ETHCTRL: Linkup status is not active
INFO: PLTFPWRMGR_ETHCTRL: Linkup status is not active
INFO: MCU_PLTFPWRMGR: Soc TMON disable.
INFO: MCU_PLTFPWRMGR: Board TMON disable.
INFO: PLTFPWRMGR_IOHWABS: Power down sequence is complete !
INFO: PLTFPWRMGR_IOHWABS: Adding Delay of 5s for poweroff
Status 0

*************** NvShell Initialization Start******************

DRIVE-V7.2.3-P3960-AFW-RH850-U2A16-3.03.02
Inforom detected Board ID - 63960-0010-000
Compilation date: Dec 8 2025, 22:00:19
Enter ‘help’ to see the available commands.

*************** NvShell Initialized *************************
Press ‘Enter’ for NvShell prompt


INFO: BtChn_Cfg: NvMCbk - TA DfltBtChn Pri Blk, SId-12, JobRes-0
INFO: BtChn_Cfg: NvMCbk - TB DfltBtChn Pri Blk, SId-12, JobRes-0
INFO: BtChn_Cfg: NvMCbk - TA NxtBtChn Pri Blk, SId-12, JobRes-0
INFO: BtChn_Cfg: NvMCbk - TB NxtBtChn Pri Blk, SId-12, JobRes-0
INFO: BtChn_Cfg: NvMCbk - TB DfltBtChn Red Blk, SId-12, JobRes-0
INFO: BtChn_Cfg: NvMCbk - TB NxtBtChn Red Blk, SId-12, JobRes-0
INFO: BtChn_Cfg: NvMCbk - TA DfltBtChn Red Blk, SId-12, JobRes-0
INFO: BtChn_Cfg: NvMCbk - TA NxtBtChn Red Blk, SId-12, JobRes-0
INFO: MCU Version PIM- DRIVE-V7.2.3-P3960-AFW-RH850-U2A16-3.03.02
INFO: PLTFPWRMGR_IOHWABS: Identified Board from Inforom - Board - 3960, BoardId - 000, Sku - 0010
INFO: PLTFPWRMGR_IOHWABS: NumSoc: 1
INFO: PLTFPWRMGR_IOHWABS: MCU reset 0x3 triggered…
INFO: MCU_PLTFPWRMGR: Number of SoCs to be powered: 1
INFO: MCU_BOARDVMON: Loading up common ADC Configuration for boards : TS2/TS3/TS4
INFO: FanMon_IoHwAbs: FAN Tach sensor measurement started
INFO: MCU_LCMCLIENT: MCU_LCMClient initialized successfully
I:RptrID - 0x810D PsCd - 0x16 - PwrCtrl Init Ok
I:RptrID - 0x810D PsCd - 0x16 - PwrCtrl Init Ok ErrAttr - 0x01
I:RptrID - 0x810F PsCd - 0x7 - BootchainCfg Cfg Range Check Success
I:RptrID - 0x810F PsCd - 0x7 - BootchainCfg Cfg Range Check Success
I:RptrID - 0x810F PsCd - 0x2 - BootchainCfg NvM Read Success
I:RptrID - 0x810F PsCd - 0x4 - BootchainCfg NvM Plaus Check Success
MCU_FOH: Switched to IDLE_STATE
MCU_FOH: Initialization done
INFO: MCU_PLTFPWRMGR: Powering up
INFO: MCU_SYSSTATEMNGR: State update notification received- 1
I:RptrID - 0x810A PsCd - 0xF - ISTManager:IST execution status read success.
I:RptrID - 0x810A PsCd - 0xE - ISTManager:No plausibility error in NVM data.
I:RptrID - 0x810A PsCd - 0x14 - ISTManager:No range plausibility error in NVM data.
INFO: PLTFPWRMGR_IOHWABS: PMIC: Wakeup Status - 0x1, Error Status: 0x0
INFO: MCU_BOARDVMON: Voltage Monitor enabled for ADCJ0 and ADCJ1 signals
INFO: PLTFPWRMGR_IOHWABS: VDD_MV_CVM voltage (547mV)
INFO: PLTFPWRMGR_IOHWABS: VDD_HV_CVM voltage (29mV)
INFO: MCU_PLTFPWRMGR: SW Latent Test configuration: 0x0
INFO: PLTFPWRMGR_IOHWABS: KL30_PROTECT Voltage (12698mV) exceeded threshold (6689mV). Continuing…!
INFO: PLTFPWRMGR_IOHWABS: SAFE_3V3_PG Voltage (3261mV) exceeded threshold (3135mV). Continuing…!
I:RptrID - 0x810D PsCd - 0x8 - PwrCtrl Dio Chk Ok
I:RptrID - 0x810D PsCd - 0x7 - PwrCtrl Config Ok
I:RptrID - 0x810D PsCd - 0x17 - PwrCtrl PrePwrUp Ok
INFO: MCU_PLTFPWRMGR: MPFDTI Timer started with period set to - 28800 seconds
INFO: PLTFPWRMGR_IOHWABS: VBAT_SOC_FAULT is set to 1. Continuing…!
INFO: PLTFPWRMGR_IOHWABS: PWRYNK_SHDN is set to 1. Continuing…!
INFO: PLTFPWRMGR_IOHWABS: VDD_MV_CVM voltage (5004mV) exceeded threshold (4751mV). Continuing…!
INFO: PLTFPWRMGR_IOHWABS: VDD_HV_CVM voltage (12107mV) exceeded threshold (6695mV). Continuing…!
I:RptrID - 0x810B PsCd - 0x20 - VMON Config Valid
I:RptrID - 0x810B PsCd - 0x13 - VMON i2c addr load passed
I:RptrID - 0x810B PsCd - 0xA - VMON Bist Passed
Check for VRS10…
I:RptrID - 0x810D PsCd - 0x3 - PwrCtrl I2C Read Ok
I:RptrID - 0x810D PsCd - 0x6 - PwrCtrl I2C Read Crc Ok
Check for VRS10…
I:RptrID - 0x810D PsCd - 0x14 - PwrCtrl VRS10 BIST Complete
I:RptrID - 0x810D PsCd - 0x19 - PwrCtrl VRS10 BIST Ok
Check for VRS10…
Check for VRS10…
Check for VRS10…
I:RptrID - 0x810D PsCd - 0x9 - PwrCtrl Vrs10 Int Ok
Check for VRS10…
I:RptrID - 0x810D PsCd - 0xD - PwrCtrl ActShdn Ok
Check for VRS10…
I:RptrID - 0x810D PsCd - 0xF - PwrCtrl ActSlp Ok
Check for VRS10…
I:RptrID - 0x810D PsCd - 0xE - PwrCtrl Actslp Ok
INFO: WAR: NVMCU_SOCPWRCTRL: Program correct OTPs for Power sequencing
Check for VRS10…
Check for VRS10…
Check for VRS10…
Check for VRS10…
Check for VRS10…
Check for VRS10…
Check for VRS10…
Check for VRS10…
Check for VRS10…
Check for VRS10…
INFO: NvMCU_SocTMON: Soc-0 Temperature sensor initialized
I:RptrID - 0x810C PsCd - 0x17 - TMON Configuration in default mode pass
Check for VRS10…
Check for VRS10…
Check for VRS10…
Check for VRS10…
Check for VRS10…
Check for VRS10…
Check for VRS10…
Check for VRS10…
Check for VRS10…
Check for VRS10…
Check for VRS10…
Check for VRS10…
Check for VRS10…
Check for VRS10…
Check for VRS10…
Check for VRS10…
Check for VRS10…
Check for VRS10…
Check for VRS10…
Check for VRS11-1…
Check for VRS11-2…
Check for VRS11-1…
I:RptrID - 0x810D PsCd - 0x1A - PwrCtrl VRS11 BIST Ok
Check for VRS11-2…
I:RptrID - 0x810D PsCd - 0x1A - PwrCtrl VRS11 BIST Ok
Check for VRS11-1…
I:RptrID - 0x810D PsCd - 0xA - PwrCtrl VRS11 Int Ok
Check for VRS11-2…
I:RptrID - 0x810D PsCd - 0xA - PwrCtrl VRS11 Int Ok
I:RptrID - 0x810D PsCd - 0x2 - PwrCtrl I2C Write Ok
Check for VRS11-1…
I:RptrID - 0x810D PsCd - 0x4 - PwrCtrl I2C Val Ok
Check for VRS11-1…
Check for VRS11-1…
INFO: SftyMon_IoHwAbs: toggle check of local and remote sensor successfull
Check for VRS11-1…
Check for VRS11-2…
Check for VRS11-2…
Check for VRS11-2…
Check for VRS11-2…
INFO: NVMCU_SOCPWRCTRL: Programming thermal thresholds done!
Check for VRS10…
Check for VRS10…
Check for VRS10…
Check for VRS10…
Check for VRS10…
Check for VRS10…
Check for VRS10…
Check for VRS10…
Check for VRS10…
Check for VRS10…
Check for VRS10…
Check for VRS10…
Check for VRS10…
I:RptrID - 0x810D PsCd - 0xC - PwrCtrl Actshdn Ok
Check for VRS10…
Check for VRS10…
Check for VRS10…
Check for VRS10…
Check for VRS10…
Check for VRS10…
I:RptrID - 0x810D PsCd - 0x10 - PwrCtrl Vrs10 NIRQ Chk Ok
INFO: NVMCU_SOCPWRCTRL: FUNC_NIRQ Toggle check Passed for VRS10 !
Check for VRS11-1…
Check for VRS11-1…
Check for VRS11-1…
I:RptrID - 0x810D PsCd - 0x11 - PwrCtrl Vrs11 NIRQ Chk Ok
INFO: NVMCU_SOCPWRCTRL: FUNC_NIRQ Toggle check Passed for VRS11-1!
Check for VRS11-2…
Check for VRS11-2…
Check for VRS11-2…
I:RptrID - 0x810D PsCd - 0x11 - PwrCtrl Vrs11 NIRQ Chk Ok
INFO: NVMCU_SOCPWRCTRL: FUNC_NIRQ Toggle check Passed for VRS11-2!
INFO: NVMCU_SOCPWRCTRL: FUNC_NIRQ continuous monitoring Enabled!
INFO: SftyMon_IoHwAbs: Board Temperature sensor initialized
INFO: NvMCU_SocTMON: toggle check of local and remote sensor successfull
I:RptrID - 0x810C PsCd - 0xB - TMON Toggle Check for Alert & Shutdown Pass
I:RptrID - 0x810B PsCd - 0x8 - VMON Seq Record
I:RptrID - 0x810B PsCd - 0x7 - VMON Seq Match
I:RptrID - 0x810B PsCd - 0x23 - VMON Seq Ackld
ERROR: MCU_ERRHANDLER: SocVMON : ReportedID - 0x810B ErrorCode - 0x15
I:RptrID - 0x810B PsCd - 0xD - VMON Plausibility pass
INFO: MCU_PLTFPWRMGR: Request NVMCU_SOCTMON_REQ_TOGGLE_CHK Service - finished
ERROR: MCU_PLTFPWRMGR: Request VMON Power-up sequence verification failed! Continue sequence

INFO: MCU_PLTFPWRMGR: Board TMON enabled.
INFO: NvMCU_SocTMON: Soc-0 Temperature sensor initialized
I:RptrID - 0x810C PsCd - 0x17 - TMON Configuration in default mode pass
INFO: NvMCU_SocTMON: Soc TMON init complete
INFO: MCU_PLTFPWRMGR: Soc TMON enabled …
INFO: NvMCU_SocTMON: Thermal Monitoring Enabled
I:RptrID - 0x810C PsCd - 0x10 - TMON All thermal events are clear
I:RptrID - 0x810C PsCd - 0x11 - TMON All alert events are clear
INFO: NvMCU_SocTMON: Thermal Alert Trigger not present anymore
I:RptrID - 0x810C PsCd - 0xF - TMON No notification present
Check for VRS10…
Check for VRS10…
Check for VRS10…
INFO: MCU_PLTFPWRMGR: Bootchain selection mode: GPIO
INFO: BtChn_Cfg: Tegra x1 Boot Chain is : A
INFO: BtChn_Cfg: Btchn Pin SOC0_BOOTCHAIN0 set to 0
INFO: BtChn_Cfg: Btchn Pin SOC0_BOOTCHAIN1 set to 0
I:RptrID - 0x810F PsCd - 0x6 - BootchainCfg Dio set success
INFO: NVMCU_SOCPWRCTRL: Tegra reset released
MCU_FOH: Switched to IDLE_STATE
MCU_FOH: MCU FOH : Start Monitoring Called
INFO: MCU_PLTFPWRMGR: Start Network power-up sequence .
INFO: PLTFPWRMGR_IOHWABS: ETH_SYS_3V3_AO voltage (3316mV) exceeded threshold (3135mV). Continuing…!
INFO: PLTFPWRMGR_IOHWABS: ETH_SYS_0V8 voltage (810mV) exceeded threshold (759mV). Continuing…!
INFO: PLTFPWRMGR_IOHWABS: ETH_SYS_3V3 voltage (3349mV) exceeded threshold (3135mV). Continuing…!
INFO: PLTFPWRMGR_IOHWABS: ETH_SYS_1V2 voltage (1212mV) exceeded threshold (1140mV). Continuing…!
INFO: PLTFPWRMGR_IOHWABS: ETH_SYS_0V9 voltage (905mV) exceeded threshold (854mV). Continuing…!
INFO: PLTFPWRMGR_IOHWABS: ETH_SYS_0V84 voltage (842mV) exceeded threshold (798mV). Continuing…!
INFO: MCU_PLTFPWRMGR: Request Power-up Ethernet Switch done !
INFO: Network config 0x3: Hyperion 8_1
INFO: RTK_SWITCH: INFO: RTK_SWITCH: INFO: RTK_SWITCH: ERROR: RTK_SWITCH: Condor1: Failed to configure switch
INFO: RTK_SWITCH: INFO: RTK_SWITCH: INFO: RTK_SWITCH: ERROR: RTK_SWITCH: Condor2: Failed to configure switch
INFO: RTK_SWITCH: INFO: RTK_SWITCH: INFO: MRVL_SWITCH: FIR ACTIVE pin disabled (DENY ALL)
INFO: MRVL_SWITCH: Fir switch 88Q5152 init complete
Initialize 88Q222X PHY with SMI address: 3 in Master Mode, Detected Q2221M Package!!!
INFO: PLTFPWRMGR_ETHCTRL: Ethernet switch and Phy initialization completed..
INFO: PLTFPWRMGR_ETHCTRL: Ethernet Link Active
INFO: PLTFPWRMGR_ETHCTRL: Linkup status is not active
INFO: MCU_PLTFPWRMGR: Request Eth initialization done !
INFO: MCU_PLTFPWRMGR: VRS11 PG Monitoring enable.
INFO: MCU_PLTFPWRMGR: Power-up sequence is complete !
INFO: CmnIf: Wdg Enabled Success!
Power on the system
INFO: MCU_ERRHANDLER: Published Power State: Power-up complete
Init Max load - 100, Average Load - 100
INFO: MCU_SYSSTATEMNGR: State update notification received- 2
INFO : MCU_ISTMGR: IST_DONE pin no monitor time of 2000 ms over.
I:RptrID - 0x810A PsCd - 0xD - ISTManager:IST DONE pin stuck at low not observed.
INFO: SftyMon_IoHwAbs: PG_VRS11 monitoring started…
MCU_FOH: Switched to FSI_MONITORING_STATE
MCU_FOH: SOC error pin is asserted
ERROR: MCU_ERRHANDLER: SOC error pin is asserted
MCU_FOH: Spi Transmit Started
MCU_FOH: ErrReport: ErrorCode-0x5801 ReporterId-0xe04c Error_Attribute-0x4000080b timestamp-0xde162c8b
MCU_FOH: ErrReport: ErrorCode-0x2807 ReporterId-0xe04c Error_Attribute-0x40000803 timestamp-0xde19d404
MCU_FOH: Periodic Status [0] 0xab [1] 0xcd [250] 0x12 [251] 0x34
ERROR: MCU_SWC_FanControl: count_NoEthFrame_TA value reached to: 101
ERROR: MCU_SWC_FanControl: count_NoEthFrame expired
ERROR: MCU_SWC_FanControl: moving to error state
MCU_FOH: ErrReport: ErrorCode-0x11 ReporterId-0x8225 Error_Attribute-0x10041 timestamp-0x5d91ba4e

Hello @carolyuu, Hello @SivaRamaKrishnaNV ,

I noticed you are quite active on this forum and knowledgeable with regards of NVIDIA DRIVE AGX Thor Development Platform.

We are currently blocked in our development and we would highly appreciate your guidance in solving the problem.

Best regards,

Dobrin Zainea.

Dear @ulrich.koerber ,
DOS 7.2.3 is not a devzone release. I believe you have other support channel to support issues on DOS 7.2.3. Could you please check with your project manager for right support channel.

FYI, you can access, tegra serial console using sudo minicom -D /dev/ttyACM0 from host to confirm if the target is booting with out any issue.

Dear @SivaRamaKrishnaNV

I am working together with @ulrich.koerber and I am currently managing the project he is working on.

Could you please advise on the appropriate forum or communication channel for addressing issues of this nature?

We have encountered a series of challenges that have led us to our current situation, and we are presently blocked. Our objective is to reinitialize the THOR ECU to its factory settings/version; however, we are experiencing difficulties in completing this process.

We plan to review the provided command line on Monday. While we were able to flash the firmware after connecting via the host PC, it appears that DOS is not booting properly.

It would be very helpful if you could provide the exact sequence of steps required to restore the THOR ECU to a clean, initial state.

Best regards,
Dobrin Zainea

Do you see boot burn successful message after flashing? Is tegra accessible before you try to flash. Which DOS version it has before flashing? Was it used in car or office premises?

I believe you must have received DOS 7.2.3 release via another channel. Please check with your NVIDIA representative to get guidance on issues related to correct support channel. Let me know if you find any difficulties.

Hi @SivaRamaKrishnaNV

here is the complete history:

1. We received the NVIDIA DRIVE THOR ECU (SKU-10 Bench Kit) as a loan from a colleague within our company

2. We noticed the THOR came with DOS 7.0.3 with no possibility to run our radar plugin “as is”

3. We then updated to DOS 7.2.3. While doing so we noticed CUDA_13 libraries are missing

4. We downgraded the CUDA to v12 but still many libraries where missing, see 00_Missing_Libraries.txt.

5. We understood from our colleague that we should connect the THOR machine to internet in order to download further libraries. We have done that but at that point it is not possible to login via User Interface, we could only access THOR kit via command line interface and we could not download further libraries. BLOC KED

6. We decided to restart from the beginning, thus we are now trying to reset the THOR kit altogether. But we ran into some initial issues: 01_Flash_Attempt_Issue_1.txt

7. After a few tries and after connecting both USB cables to host pc we managed to flash the firmware but now we got ERROR: MCU_ERRHANDLER message present in file 02_PostFlash_Error_ 1.txt

Additional notes:

  • In parallel, we ordered and received our NVIDIA DRIVE THOR ECU (SKU12 – Vehicle Kit) and we want to power that up beginning of next week. But of course we still need to bring back the SKU- 10.

  • We are using SKU-10 on bench and we plan to use SKU-12 in vehicle but we initially want to start it up on the bench.

My current difficulties:

  • We do not know how to properly reset NVIDIA DRIVE THOR ECU SKU-10.
  • We do not have a proper channel to raise these issues.

I would kindly ask you to guide me in obtaining the information to solve my issues. If required, I can create another thread in another forum or here for asking how to correctly update the DOS703 to DOS723.

Thank you in advance for your support.

00_Missing_Libraries.txt (8.7 KB)

01_Flash_Attempt_Issue_1.txt (4.9 KB)

02_PostFlash_Error_1.txt (12.5 KB)

.

Could you please check my private message

Few general guidelines before flashing

  1. Make sure connections are maintained as shown in Quick start guide.
  2. Make sure to keep the board in normal mode from aurix console and check poweroff and poweron is working from aurix console. Before trying to flash make sure the board is on(power on should be success from aurix console) and all serial console sessions to target are closed.

Note that, DOS 7.2.3 tied to CUDA 13. So do not attempt to degrade CUDA version. Make sure the used sample_radar_replay is cross compiled on DOS 7.2.3 docker.

Hi @SivaRamaKrishnaNV

Thank you again for your fast replies. I will notify you as soon as I have information for the private message.

Regarding 1. Just to avoid any confusion, could you please share the Quick Start Guide ?

Regarding 2. Understood, we will check that.

Regarding the note: the issue was that the initial radar sample provided with 7.0.3 was not compiling or even missing (my colleague would have to confirm here). So we got the recommendation to upgrade to DOS 7.2.3 and then the libraries and compilation issues started. This is why I was asking for a guide on how to do it right or how it should be updated the right way.

Thank you,

Dobrin

Check page 15 in https://developer.nvidia.com/downloads/drive-agx-thor-hardware-quick-start-guide.pdf

Note that in DOS 7.0.3, DW samples/tools are not supported on x86. I don’t see any issue with cross compilation.

root@7.0.3.0-0010-linux-nsr:/home/nvidia/DW-DOS7.0.3/src/sensors/radar/radar_replay# ls
CMakeFiles  Makefile  cmake_install.cmake  sample_radar_replay
root@7.0.3.0-0010-linux-nsr:/home/nvidia/DW-DOS7.0.3/src/sensors/radar/radar_replay# file sample_radar_replay
sample_radar_replay: ELF 64-bit LSB pie executable, ARM aarch64, version 1 (GNU/Linux), dynamically linked, interpreter /lib/ld-linux-aarch64.so.1, for GNU/Linux 3.7.0, not stripped

I am closing this topic as you are getting support from other channel . Please file a new topic if you notice issues(or need support) with DevZone releases.