DRIVE AGX fail to power on frequently

Hello,

I am having trouble that the Xaviers on DRIVE AGX fails to power on frequently.

  • Hardware Platform:DRIVE AGX Xavier™ Developer Kit(S/N:E3550-B03-S1597)
  • Software Version:DRIVE Software 9

I checked the 9 LEDs status that can be seen from the corner as described in the document(DRIVE AGX Developer Kit Mechanical & Installation Guide (PDF) / Figure 2-3).
Only the following LEDs are lit, so it seems that the Xaviers are not powered on.

  • DS6: Turns ON when AURIX_3V3 / AURIX_1V3 are both powered
  • DS7: Turns ON when KL30_POWER / AURIX_5V are both powered
  • DS4: Turns ON when AURIX_3V3 is powerd

The aurix command log when the problem occurred is attached.
(command : version, status, poweron, aurixreset and so on)

Does anyone know what causes this problem and how to fix it?
Any information would greatly help.

Thanks a lot in advance.aurix_command.log (4.4 KB)

Hi @atsutaka,

According to below snippet from your attached aurix_command.log, does it mean after the problem occured you still can power on successfully with “poweron” command? Why did “Thermal Shutdown Triggered” happen at the end of the snippet? You physically switched off the power?

shell> poweron
Info: Executing cmd: poweron, argc: 0, args:
Power-on from shell command .
shell> Power On sequence Triggered.
Configured BootStraps for X1 as QSPI
Configured BootStraps for X2 as QSPI
Temperature sensor initialized
Init SJA1105 sucessfully
Initial 88Q2112
Initial 88Q2112 A0 silicon slave address 01
Initial 88Q2112 A0 silicon slave address 02
Initial 88Q2112 A0 silicon slave address 03
Initial 88Q2112 A0 silicon slave address 04
Initial 88Q2112 A0 silicon slave address 05
Initial 88Q2112 A0 silicon slave address 06
Initial 88Q2112 A0 silicon slave address 07
Initial 88e6321_1
Enable VLAN
Initial 88e6321_1 port 1 in SGMII Mode.
Initial 88e6321_2
Reset port 3,4 Rxc delay line
Initial channel J11 A: speed 100, HSD Role: Master
Initial channel J21 A: speed 100, HSD Role: Master
Initial channel J21 B: speed 100, HSD Role: Master
Initial channel J14 A: speed 100, HSD Role: Master
Initial channel J14 B: speed 1000, HSD Role: Master
Initial channel J12 A: speed 100, HSD Role: Master
Initial channel J12 B: speed 100, HSD Role: Master
Initial channel J9, J10, J11-B: speed 100, HSD Role: Master
Config switch to use VLAN to reduce interference
Command Executed
Thermal Shutdown Triggered
Board Thermal Alert = HIGH
Board Thermal Shutdown = HIGH
Xavier A Thermal Alert = LOW
Xavier A Thermal Shutdown = LOW
Xavier B Thermal Alert = LOW
Xavier B Thermal Shutdown = LOW
CVM Thermal Alert = N/A
CVM Thermal Shutdown = N/A
We’re going to power off the system due to Thermal Shutdown

shell>

Could you try if installing DRIVE Software 10.0 will help on this problem?

We just issued a “poweron” command only. we do not turn off mechanical switch.
And aurix says “We’re going to power off the system due to Thermal Shutdown”.
We want to know why it prints “Thermal Shutdown Triggered” and how to avoid it.

We still have to use “DRIVE Software 9”.
Is it possible to update only the aurix firmware?

Please refer to the documentations for it. Thanks!

https://docs.nvidia.com/drive/active/5.1.0.2L/nvvib_docs/index.html#page/DRIVE_OS_Linux_SDK_Development_Guide%2FFlashing%2520Basics%2Fflashing_setup.html%23

we will try it!

We have updated the firmware as follows.
before
Flashed FW version is 3.02.07.00
SW Version: DRIVE-V5.1.0-E3550-EB-Aurix-ForHyperion-3.02.07
after
Flashed FW version is 3.02.07.00
SW Version: DRIVE-V5.1.0-E3550-EB-Aurix-With3LSS-3.02.07

But the problem is not resolved.
What should we do next?

I need your help to clarify some things:

Did you mean it doesn’t always fail to boot? What’s the failure rate?

Has it ever been working after you receiving the system? What happened in between the system working and not working? Was the system ever in a vehicle?

Did you mean it doesn’t always fail to boot? What’s the failure rate?

It occurs about once every three times.
It seems to occur frequently at cold boot at the first time when KL30_VBAT connector connects to +12V.

Has it ever been working after you receiving the system?

yes.

What happened in between the system working and not working?

If the boot fails, we try the following:

  1. Disconnect KL30_VBAT
  2. Wait 30 seconds to 1 minute.
  3. Connect KL30_VBAT to +12V again.
    Repeat 1-3 until it starts up. Sometimes more than 10 times!

Was the system ever in a vehicle?

yes

Does the system have the booting issue from day 1 you receiving it?

Is “showvoltages” command available in your aurix console? If yes, could you help to provide the status by running it? If no, please flash IFW firmware and then get the status. Thanks!

Probably, yes.

Is “showvoltages” command available in your aurix console? If yes, could you help to provide the status by running it? If no, please flash IFW firmware and then get the status. Thanks!

we will check.

We tried “showvoltages” command and it was not found as following.

shell> version
Info: Executing cmd: version, argc: 0, args: 
SW Version: DRIVE-V5.1.0-E3550-EB-Aurix-With3LSS-3.02.07
Compilation date: Apr  2 2019, 14:24:45
Command Executed
shell> showvoltages
Info: Executing cmd: showvoltages, argc: 0, args: 
Error: Unknown command
Invalid Command

How can we flash IFW firmware?

Hi @amano03bnt,

  • Please follow “Identifying the AURIX Step Version” to know your aurix chip version (Step A or B).

  • Accordingly run either below command on the target to flash IFW firmware.

#for Step A
$ nv_aurix_update -i /lib/firmware/DRIVE-V5.1.0-E3550-NV-Aurix-IFW-StepA-1.29.00.hex -u /lib/firmware/DRIVE-V5.1.0-E3550-NV-Aurix-UPDATE-StepA-1.29.00.hex -f 1

#for Step B
$ nv_aurix_update -i /lib/firmware/DRIVE-V5.1.0-E3550-NV-Aurix-IFW-StepB-1.36.00.hex -u /lib/firmware/DRIVE-V5.1.0-E3550-NV-Aurix-UPDATE-StepB-1.36.00.hex -f 1

  • run “showvoltages” command in aurix console
    showvoltages

Does the issue happen even used on a bench (set up by following “Setting Up the E3550 Platform” with “Power supply, power adapter and US power cord”)?

Thank you. I’ll try maybe next week.

Please also try if the solution for the erratum in “DRIVE AGX Developer Kit Hardware Errata” (https://developer.nvidia.com/DRIVE/secure/docs/DE-08933-001_v06.pdf) helps.
image

@amano03bnt, any update on this issue?