PowerOn sequence error: Drive fails to boot

Software Version
DRIVE OS Linux 5.2.0
DRIVE OS Linux 5.2.0 and DriveWorks 3.5
NVIDIA DRIVE™ Software 10.0 (Linux)
NVIDIA DRIVE™ Software 9.0 (Linux)
other DRIVE OS version
other

Target Operating System
Linux
QNX
other

Hardware Platform
NVIDIA DRIVE™ AGX Xavier DevKit (E3550)
NVIDIA DRIVE™ AGX Pegasus DevKit (E3550)
other

SDK Manager Version
1.4.0.7363
other

Host Machine Version
native Ubuntu 18.04
other

Issue

We had been running a Pegasus unit in a vehicle for a few weeks without issue. Now, on powering the unit, the fans no longer spin. After power cycling and querying the Aurix board via serial, we see the following error:

IOHWABS_PWRUP_SEQUENCE_ERROR : PowerOn sequence encountered errors...

The poweron command produces:

aurixreset shows:

System status output:

We swapped the system with another Pegasus unit (kept the same power supply, harness, sensor inputs, etc.) and that worked fine

What is wrong with the unit and how might we get it resolved?

Thank you,
Osman

Dear @oshawkat ,
Could you check reflashing the target to avoid SW/system state related issues. If the issues persists, it could be HW issue

Thank you for your response Siva.

I have tried re-flashing the unit using the Nvidia SDK software. The flashing process reached 99% but never completed, even after an extended wait. I eventually exited the flash and manually rebooted the machine. Here are the logs: 007_flashing_post-exit.log (114.5 KB)

After rebooting, I still see the same IOHWABS_PWRUP_SEQUENCE_ERROR as before. The fans do not turn on and neither Xaviers A or B are responsive. I’ve also collected some additional log outputs from the Aurix (I’ve removed most of the repeated errors for clarity): 007_drive_post_flash_aurix.txt (3.0 KB)

Dear @oshawkat ,
Could you please delete ~/.nvsdkm/ folder and check flashing to latest logs. Please share the ~/.nvsdkm/logs folder and sdkm.log file.
How much time have you waited?

What do you mean by how much time we have waited? Do you mean between power cycles? We wait the required amount of time that has been specified in the documentation and on the forum.

Is the only recourse to fix messages like this from the Aurix to do a reflash of the entire system?

Dear @ChrisB ,
Could you confirm if the symtoms is same in Error encountered when using minicom to connect aurix .

Hi @oshawkat ,
It looks this unit still had “DRIVE-V5.1.6-E3550-EB-Aurix-With3LSS-ForHyperion-StepA-3.05.04” and hasn’t been installed with DRIVE OS 5.2.0.
Please “don’t” mess up the two units and “don’t” install DRIVE OS 5.2.0 on this unit.

For this topic, please try to flash IFW firmware (as you did in Aurix MODMGR_ERROR preventing device boot - #22 by oshawkat) to collect the information for our analysis. Thanks.

Thank you for the clarification - we will not install DRIVE OS 5.2.0 on this unit.

We flashed the updated IFW firmware and ran showvoltages:

DDPX Aurix Serial Console
E3550-B03
with TLF35584 B/C-Step
SW Version 1.29.10
DRIVE-v5.1.x-E3550-NV-Aurix-IFW-StepA-1.29.10
TC397 Step A
INPUTS:
KL30_VBAT = 12.133 V
KL30_POWER = 12.140 V
KL15_POWER = 12.221 V
AURIX:
AURIX_1V3 = 1.257 V
AURIX_3V3 = 3.313 V
AURIX_5V = 5.005 V
CAN1_5V = 5.022 V
CAN2_5V = 4.993 V
FR_5V = 5.003 V
HRNS_GPI = 0.402 V
SYSTEM:
VBAT_SYS = 0.036 V
VBATSYS_ISENSE = 0.010 A
SYS_5V = 0.000 V
SYS_0V85 = 0.001 V
SYS_0V92_1 = 0.001 V
SYS_0V92_2 = 0.001 V
SYS_1V0 = 0.001 V
SYS_1V1_2 = 0.000 V
SYS_1V2 = 0.001 V
SYS_1V5_2 = 0.001 V
SYS_1V8_1 = 0.000 V
SYS_1V8_2 = 0.001 V
SYS_2V1 = 0.000 V
SYS_2V5 = 0.001 V
SYS_3V3_1 = 0.001 V
SYS_3V3_2 = 0.001 V
CE_PREREG = 0.005 V
Tegra A:
VBAT_TEG = 12.126 V
VBATTEG_ISENSE = 0.562 A
XA_PREREG = 0.005 V
XA_5V = 5.114 V
XA_5V_SW = 0.004 V
XA_VDD_1V0 = 0.001 V
XA_VDD_1V8_AO = 0.000 V
XA_VDD_1V8_HS = 0.001 V
XA_VDD_1V8_LS = 0.001 V
XA_VDD_CPU = 0.001 V
XA_VDD_CV = 0.001 V
XA_VDD_DDR2 = 0.001 V
XA_VDD_DDRQ = 0.001 V
XA_VDD_DDR_1V1 = 0.001 V
XA_VDD_GPU = 0.000 V
XA_VDD_SOC = 0.001 V
X1_DGPU_THERM_ALERT_N = 0.003 V
VBAT_SXMA = 0.007 V
VBATSXMA_ISENSE = 0.010 A
SXMA_PREREG = 0.011 V
Tegra B:
XB_PREREG = 0.005 V
XB_5V = 5.153 V
XB_5V_SW = 0.002 V
XB_VDD_1V0 = 0.001 V
XB_VDD_1V8_AO = 0.001 V
XB_VDD_1V8_HS = 0.001 V
XB_VDD_1V8_LS = 0.001 V
XB_VDD_CPU = 0.001 V
XB_VDD_CV = 0.000 V
XB_VDD_DDR2 = 0.001 V
XB_VDD_DDRQ = 0.001 V
XB_VDD_DDR_1V1 = 0.001 V
XB_VDD_GPU = 0.001 V
XB_VDD_SOC = 0.001 V
X2_DGPU_THERM_ALERT_N = 0.004 V
VBAT_SXMB = 0.007 V
VBATSXMB_ISENSE = 0.000 A
SXMB_PREREG = 0.000 V
CVM:
P1_PREREG = 0.011 V
P1_5V = 0.004 V

To follow up on prior questions, this is the log from attempting a clean installation of NVIDIA DRIVE™ Software 10.0 (prior to flashing the Aurix): SDKM_logs_2021-02-26_12-00-11.zip (1.5 MB)

And a picture of the indicator lights:

I’m not sure if this platform allows for video uploads but one of the lights is blinking so have included that here:

From boot onward, it looks like the DS6, DS7, and DS4 lights are constant and the DS2 light is flashing at 1/5 Hz

Please let me know if there is any additional information we can provide or debug procedures to run

“AURIX port for board e3550-t194 not found! Please check if board is connected.” message in sdkm-2021-02-25-15-00-08.log in the zip file was seen in below topic due to running on virtual machine. Please try to install with sdkmanager on native Ubuntu 18.04. Thanks.

We can try again with a native Ubuntu 18.04 machine

To accelerate the debug process, here is some background info on this unit:

According to our team, this log shows that the system was not even turned on.
Could you to issue a “poweron” command and then try to capture “showvoltage”? Thanks.

Hi Vick. We’ve re-run the command after issuing the poweron command:

Shell>poweron
Waiting 5V to be in valid range…
XA_5V is in range, value 5116 mv,after continuous 10 samples at 1 ms polling rae
XB_5V is in range, value 5155 mv,after continuous 10 samples at 1 ms polling rae
P1_5V is in range, value 4 mv,after continuous 10 samples at 1 ms polling rate
Time used 10 ms
low threshold is 80 mV, sampling time 10 ms
System_PowerOn: Reading PG status: Timeout Error!
XA_5V_PG/XB_5V_PG/P1_5V_PG
Command Fail
Shell>showvoltages
INPUTS:
KL30_VBAT = 12.148 V
KL30_POWER = 12.148 V
KL15_POWER = 12.228 V
AURIX:
AURIX_1V3 = 1.256 V
AURIX_3V3 = 3.313 V
AURIX_5V = 5.003 V
CAN1_5V = 5.024 V
CAN2_5V = 4.993 V
FR_5V = 5.003 V
HRNS_GPI = 0.405 V
SYSTEM:
VBAT_SYS = 0.036 V
VBATSYS_ISENSE = 0.010 A
SYS_5V = 0.002 V
SYS_0V85 = 0.001 V
SYS_0V92_1 = 0.000 V
SYS_0V92_2 = 0.000 V
SYS_1V0 = 0.001 V
SYS_1V1_2 = 0.001 V
SYS_1V2 = 0.001 V
SYS_1V5_2 = 0.001 V
SYS_1V8_1 = 0.001 V
SYS_1V8_2 = 0.001 V
SYS_2V1 = 0.001 V
SYS_2V5 = 0.001 V
SYS_3V3_1 = 0.001 V
SYS_3V3_2 = 0.001 V
CE_PREREG = 0.005 V
Tegra A:
VBAT_TEG = 12.126 V
VBATTEG_ISENSE = 0.552 A
XA_PREREG = 0.005 V
XA_5V = 5.114 V
XA_5V_SW = 0.004 V
XA_VDD_1V0 = 0.001 V
XA_VDD_1V8_AO = 0.000 V
XA_VDD_1V8_HS = 0.001 V
XA_VDD_1V8_LS = 0.001 V
XA_VDD_CPU = 0.001 V
XA_VDD_CV = 0.001 V
XA_VDD_DDR2 = 0.001 V
XA_VDD_DDRQ = 0.001 V
XA_VDD_DDR_1V1 = 0.000 V
XA_VDD_GPU = 0.000 V
XA_VDD_SOC = 0.001 V
X1_DGPU_THERM_ALERT_N = 0.004 V
VBAT_SXMA = 0.014 V
VBATSXMA_ISENSE = 0.010 A
SXMA_PREREG = 0.000 V
Tegra B:
XB_PREREG = 0.005 V
XB_5V = 5.155 V
XB_5V_SW = 0.004 V
XB_VDD_1V0 = 0.001 V
XB_VDD_1V8_AO = 0.000 V
XB_VDD_1V8_HS = 0.001 V
XB_VDD_1V8_LS = 0.000 V
XB_VDD_CPU = 0.000 V
XB_VDD_CV = 0.001 V
XB_VDD_DDR2 = 0.001 V
XB_VDD_DDRQ = 0.001 V
XB_VDD_DDR_1V1 = 0.000 V
XB_VDD_GPU = 0.001 V
XB_VDD_SOC = 0.001 V
X2_DGPU_THERM_ALERT_N = 0.003 V
VBAT_SXMB = 0.007 V
VBATSXMB_ISENSE = 0.010 A
SXMB_PREREG = 0.005 V
CVM:
P1_PREREG = 0.011 V
P1_5V = 0.004 V

Have you ever had a chance to try if hard power cycling the unit helps on this issue? Thanks.

For this topic, please also submit the bug via below and let us know the bug #.

https://developer.nvidia.com/ → “My account” (https://developer.nvidia.com/user) → “My Bugs” → “Submit a New Bug” (https://developer.nvidia.com/nvidia_bug/add).

Afterwards you can contact with you nvidia represetative on this issue.
Thanks!

Thank you for your response Vick. We have tried hard power cycling the unit multiple times but see the same results

I have submitted a bug ticket for this issue: #3268116

Thank you, @oshawkat !

Hi @oshawkat
The serial number of the system posted above ( Serial Number: 1611420000405) is not correct. It should be in a format like EXXXX-BXX-SXXXX.
Can you please update the bug with the correct serial number or post it here?
Thanks!

Hello. Below is an image of the serial and part numbers from the bottom of the unit. I’m not seeing anything in the format you specified but can take another look in the morning

image

Please look at the location pointed by the white arrow. You will find the S/N there.