DRIVE AGX Xavier not booting up

Please provide the following info (check/uncheck the boxes after creating this topic):
Software Version
DRIVE OS Linux 5.2.6
DRIVE OS Linux 5.2.6 and DriveWorks 4.0
DRIVE OS Linux 5.2.0
DRIVE OS Linux 5.2.0 and DriveWorks 3.5
NVIDIA DRIVE™ Software 10.0 (Linux)
NVIDIA DRIVE™ Software 9.0 (Linux)
other DRIVE OS version
other

Target Operating System
Linux
QNX
other

Hardware Platform
NVIDIA DRIVE™ AGX Xavier DevKit (E3550)
NVIDIA DRIVE™ AGX Pegasus DevKit (E3550)
other

SDK Manager Version
1.9.1.10844
2.1.0.11660
other

Host Machine Version
native Ubuntu 18.04
other

After a weekend I tried to access the board on Monday and something happened to the DRIVE since I’m not able to boot it up. I have the following status around the system:

Led status:
off on on off on on off blinking off

Fans status:
All off
While trying to restart Aurix manually:
NvShell>aurixreset
Info: Executing cmd: aurixreset, argc: 0, args:
NvShell>INFO: PSM_ModMgr: Powering off
Stop task Nvptp_Task_88E6321_2
Stop task Nvptp_Task_88E6321_1
Stop Nvptp_Task_SJA1105Q
INFO: PSM_EthInit: Ethernet peripherals de-initialized

*************** NvShell Initialization Start******************
DRIVE-V5.2.6-E3550-AFW-Aurix-With3LSS-StepA-4.04.01
Compilation date: Apr 6 2021, 00:00:08
Enter ‘help’ to see the available commands.

*************** NvShell Initialized *************************
Press ‘Enter’ for NvShell prompt


INFO: PSM_ModMgr: Powering up
INFO: PSM_IoHwAbs: Power On sequence Triggered.
INFO: PSM_IoHwAbs: Waiting for Voltages to drop below safe margin of 80mV -
INFO: SftyMon_tmon: Temperature sensor initialized
INFO: PSM_PwrCtrl: PSM_PwrCtrl_SysPwrOn_PMIC Reading PG status Timeout Error!
INFO: PSM_PwrCtrl: X2_XB_VDD_SOC_XB_VDD_DDR2_XB_VDD_DDRQ_PG
ERROR: PSM_ModMgr: Critical error reported with error ID: 0x82
ERROR: PSM_ModMgr: MODMGR_ERROR

Sdkmanager error:
5:33:13 INFO: Flash Xavier A+B in parallel - flash: BOARD SKU : 940-63550-2000-100
15:33:13 INFO: Flash Xavier A+B in parallel - flash: Executing bind clean cmd make -f Makefile.bind PCT=linux BOARD=e3550b03-t194a clean
15:33:13 INFO: Flash Xavier A+B in parallel - flash: Executing bind cmd make -f Makefile.bind PCT=linux BOARD=e3550b03-t194a
15:33:17 INFO: Flash Xavier A+B in parallel - flash: Executing bind clean cmd make -f Makefile.bind PCT=linux BOARD=e3550b03-t194b clean
15:33:17 INFO: Flash Xavier A+B in parallel - flash: Executing bind cmd make -f Makefile.bind PCT=linux BOARD=e3550b03-t194b
15:33:20 INFO: Flash Xavier A+B in parallel - flash: Bind partitions done!
15:33:20 INFO: Flash Xavier A+B in parallel - flash: Pre flash script found! Calling scripts/linux_pre_flash.sh
15:33:29 INFO: Flash Xavier A+B in parallel - flash: Running flash command: sudo -E /home/ridgerun/nvidia/nvidia_sdk/DRIVE_OS_5.2.6_SDK_Linux_OSWithSamples_DRIVE_AGX_XAVIER/DRIVEOS/drive-t186ref-foundation//tools/host/flashtools/bootburn_t19x/bootburn.sh -b e3550b03-t194 -B qspi -x /dev/ttyUSB3
15:33:33 INFO: Flash Xavier A+B in parallel - flash: [bootburn]: [executeShellCommand(156)] : {‘cmd’: [‘lsusb’, ‘-d’, ‘0955:’], ‘output’: ‘’, ‘returncode’: 1}
15:33:33 ERROR: Flash Xavier A+B in parallel - flash: command terminated with error
15:33:33 SUMMARY: Flash Xavier A+B in parallel - flash: First Error: Installation failed.

(Because neither Xavier is available in /dev/)

lsusb is not showing the nvidia devices as usual (even if you set both Xavier A and B in recovery mode):
ridgerun@ridgerun-Latitude-E5430-non-vPro:~$ lsusb
Bus 002 Device 002: ID 8087:0024 Intel Corp. Integrated Rate Matching Hub
Bus 002 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
Bus 001 Device 005: ID 08ff:2810 AuthenTec, Inc. AES2810
Bus 001 Device 004: ID 0c45:6449 Microdia
Bus 001 Device 003: ID 413c:8197 Dell Computer Corp.
Bus 001 Device 012: ID 0403:6011 Future Technology Devices International, Ltd FT4232H Quad HS USB-UART/FIFO IC
Bus 001 Device 011: ID 0403:6011 Future Technology Devices International, Ltd FT4232H Quad HS USB-UART/FIFO IC
Bus 001 Device 010: ID 0403:6011 Future Technology Devices International, Ltd FT4232H Quad HS USB-UART/FIFO IC
Bus 001 Device 009: ID 0403:6011 Future Technology Devices International, Ltd FT4232H Quad HS USB-UART/FIFO IC
Bus 001 Device 008: ID 0424:2807 Standard Microsystems Corp.
Bus 001 Device 002: ID 8087:0024 Intel Corp. Integrated Rate Matching Hub
Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
Bus 004 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
Bus 003 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub

I reflashed the Aurix firmware successfully, version: DRIVE-V5.2.6-E3550-AFW-Aurix-With3LSS-StepA-4.04.01.hex using memtool in a Windows machine with Windows 11. This didn’t help at all to fix the not-booting issue in both Xaviers.

I also tried to flash these files in order to be able to run the command showvoltages in my Aurix shell:
DRIVE-V5.2.x-E3550-NV-Aurix-IFW-StepA-1.29.16.hex
DRIVE-V5.2.x-E3550-NV-Aurix-UPDATE-StepA-1.29.16.hex

But the flash process failed:

At this point, I’m stuck with the development on this platform so anybody has experienced this issue after checking all the other forums available here?

Were there any changes made to the system (e.g. connection, update) recently or before the boot failure occurred?

Also, please provide the following info of the system:

  • 940 number of the system including the S/N
  • Date the system was purchased.
  • System used in a vehicle?
  • If yes, please share power supply and GNDing diagram

Hi @VickNV

The only “change” was to try to flash the board with the SDK Manager, but the process end up sucessfully and I was able to restart the board multiple times without any issue.

Here is the requested information (limited due to RidgeRun internal policies):

  • 940 number of the system including the S/N
    → 940-63550-2000-100 AU

  • Date the system was purchased.
    → NDA

  • System used in a vehicle?
    → No

  • If yes, please share power supply and GNDing diagram
    → Doesn’t apply due to the previous answer.

Thanks in advance for your help.

Could you explain why you needed to flash the board with the SDK Manager? Was it due to specific issues, or were you upgrading DRIVE OS? If it was an upgrade, which version were you using before switching to 5.2.6?

Also, please ensure you’re using a USB 2.0 A-to-A cable, and try reseating the USB cables. After that, perform a cold reboot of the system.

We are getting familiar with the system and the basics such as flashing, because our customer wants to create a custom SO. But before starting with any task, we always check the basic functionality of the platforms where we work, like overall stability. The board has no modification at the SW or HW level, everything is from stock.

Before the system was flashed, it had the same OS version (5.2.6), the recent flashing process was a fresh reflash of the same OS version.

We’re using the same USB cable that comes with the board, I think it’s ok since the first time I flashed the board there were not any problems at all. Also, I was able to re-flash the Aurix firmware with the same cable with no errors.

Multiple cold rebootings were tried with no positive effect. I also tried to wait one hour before one of the rebootings to be sure that the system started from a stable state.

Could you make sure that the MCU Programming Switch has been switched back to the RUN position? Have you tried power cycling your host system and connecting to a different USB 3.0 port? Please also check whether /dev/ttyUSB* appears after these steps.

Yes the Switch was set back to run possition in every try

Yes I tried rebooting the host system an connecting to a different USB port

I can have access to every USB since Aurix still works, attached you will find the output:

ridgerun@ridgerun-Latitude-E5430-non-vPro:~$ ls /dev/
autofs           log           pts       tty3       ttyS1     ttyUSB6
block            loop0         random    tty30      ttyS10    ttyUSB7
bsg              loop1         rfkill    tty31      ttyS11    ttyUSB8
btrfs-control    loop10        rtc       tty32      ttyS12    ttyUSB9
bus              loop11        rtc0      tty33      ttyS13    udmabuf
cdrom            loop12        sda       tty34      ttyS14    uhid
cdrw             loop13        sda1      tty35      ttyS15    uinput
char             loop14        sda2      tty36      ttyS16    urandom
console          loop15        sda5      tty37      ttyS17    userio
core             loop16        serial    tty38      ttyS18    v4l
cpu              loop17        sg0       tty39      ttyS19    vcs
cpu_dma_latency  loop18        sg1       tty4       ttyS2     vcs1
cuse             loop19        shm       tty40      ttyS20    vcs2
disk             loop2         snapshot  tty41      ttyS21    vcs3
dri              loop20        snd       tty42      ttyS22    vcs4
drm_dp_aux0      loop21        sr0       tty43      ttyS23    vcs5
drm_dp_aux1      loop22        stderr    tty44      ttyS24    vcs6
drm_dp_aux2      loop23        stdin     tty45      ttyS25    vcsa
dvd              loop24        stdout    tty46      ttyS26    vcsa1
dvdrw            loop25        tty       tty47      ttyS27    vcsa2
ecryptfs         loop26        tty0      tty48      ttyS28    vcsa3
fb0              loop27        tty1      tty49      ttyS29    vcsa4
fd               loop3         tty10     tty5       ttyS3     vcsa5
freefall         loop4         tty11     tty50      ttyS30    vcsa6
full             loop5         tty12     tty51      ttyS31    vcsu
fuse             loop6         tty13     tty52      ttyS4     vcsu1
gpiochip0        loop7         tty14     tty53      ttyS5     vcsu2
hpet             loop8         tty15     tty54      ttyS6     vcsu3
hugepages        loop9         tty16     tty55      ttyS7     vcsu4
hwrng            loop-control  tty17     tty56      ttyS8     vcsu5
i2c-0            mapper        tty18     tty57      ttyS9     vcsu6
i2c-1            mcelog        tty19     tty58      ttyUSB0   vfio
i2c-2            media0        tty2      tty59      ttyUSB1   vga_arbiter
i2c-3            mei0          tty20     tty6       ttyUSB10  vhci
i2c-4            mem           tty21     tty60      ttyUSB11  vhost-net
i2c-5            mqueue        tty22     tty61      ttyUSB12  vhost-vsock
i2c-6            net           tty23     tty62      ttyUSB13  video0
i2c-7            null          tty24     tty63      ttyUSB14  video1
i2c-8            nvram         tty25     tty7       ttyUSB15  zero
initctl          port          tty26     tty8       ttyUSB2   zfs
input            ppp           tty27     tty9       ttyUSB3
kmsg             psaux         tty28     ttyprintk  ttyUSB4
kvm              ptmx          tty29     ttyS0      ttyUSB5

Only one USB (the DRIVE) is connected to the host computer

/dev/ttyUSB* looks normal.

Please list the current nvidia devices with the following command:

$ lsusb -d 0955:

Sure, here is:

ridgerun@ridgerun-Latitude-E5430-non-vPro:~$ lsusb -d 0955:
ridgerun@ridgerun-Latitude-E5430-non-vPro:~$ 

It’s normal not to see anything with lsusb -d 0955: if the system isn’t in recovery mode.

Could you provide the log from the Xavier A after power cycling the system? You can use the following command to access the console:

sudo minicom -D /dev/ttyUSB2

While trying to reset the board(Aurix port /dev/ttyUSB3):

NvShell>tegrareset x1
Info: Executing cmd: tegrareset, argc: 1, args: x1 
NvShell>INFO: PSM_ModMgr: Reseting 
INFO: PSM_PwrCtrl: COM Exp card is not detected
INFO: PSM_PwrCtrl: COM Exp is not released from reset
INFO: PSM_PwrCtrl: Tegra x1 Boot Chain: A
INFO: PSM_ModMgr: Command Executed 

NvShell>tegrareset x2
Info: Executing cmd: tegrareset, argc: 1, args: x2                                                                                                                                
NvShell>INFO: PSM_ModMgr: Reseting                                                                                                                                                
INFO: PSM_ModMgr: Safe shutdown notification received                                                                                                                             
INFO: PSM_PwrCtrl: COM Exp card is not detected                                                                                                                                   
INFO: PSM_PwrCtrl: COM Exp is not released from reset                                                                                                                             
INFO: PSM_PwrCtrl: Tegra x2 Boot Chain: A                                                                                                                                         
INFO: PSM_ModMgr: Command Executed                                                                                                                                                
                                                                                                                                                                                  
NvShell>

The output from /dev/ttyUSB2 is:

Welcome to minicom 2.7.1

OPTIONS: I18n 
Compiled on Aug 13 2017, 15:25:34.
Port /dev/ttyUSB2, 16:27:23

Press CTRL-A Z for help on special keys

Which means no response from that port which must be logging something.

After setting the system to recovery mode with the following:

NvShell> tegrarecovery x1 on 
... 
NvShell> tegrareset x1

Are you able to list the nvida device?

$ lsusb -d 0955:

No, here is the output of the commands:

NvShell>tegrarecovery x1 on                                                                                                                                                       
Info: Executing cmd: tegrarecovery, argc: 2, args: x1 on                                                                                                                          
Command Executed                                                                                                                                                                  
NvShell>tegrareset x1                                                                                                                                                             
Info: Executing cmd: tegrareset, argc: 1, args: x1                                                                                                                                
NvShell>INFO: PSM_ModMgr: Reseting                                                                                                                                                
INFO: PSM_PwrCtrl: COM Exp card is not detected                                                                                                                                   
INFO: PSM_PwrCtrl: COM Exp is not released from reset                                                                                                                             
INFO: PSM_PwrCtrl: Tegra x1 Boot Chain: A                                                                                                                                         
INFO: PSM_ModMgr: Command Executed
ridgerun@ridgerun-Latitude-E5430-non-vPro:~$ lsusb -d 0955:
ridgerun@ridgerun-Latitude-E5430-non-vPro:~$

Had you successfully obtained output from both /dev/ttyUSB2 and /dev/ttyUSB6 before encountering this issue?

Could you power cycle your target system and attempt to retrieve the output from the XB through /dev/ttyUSB6? Could you try another USB cable?

Yes, before the issue I had full access to those ports. Now the DRIVE doesn’t even turn on any fan. I only have access to the Aurix console as you saw before.

I just power cycled the DRIVE and still have no response from ports USB2 or USB6.

Could you please also provide the following details? I’ll provid all information you have provided to our team and get back to you.

  • Serial Number:
  • Sales Order Number:
  • What software is currently running on the system?
  • What changed between the system working and not working?
  • Is there any visible physical damage to the system?
  • Serial Number:

940-63550-2000-100 AU, S/N: 1613020000263
T194-A: REV.A02 BR:03
T194-B: REV.A02 BR:03

E3550: 699-63550-001-501 K1, S/N: 1611920000743

CIM: 699-63553-0000-300 F2, S/N: 1610920000458

  • Sales Order Number:
    Confidential due to RidgeRun policies

  • What software is currently running on the system?
    5.2.6

  • What changed between the system working and not working?
    Nothing, the system was in that state on Monday in the office from a weekend where no one used it at all.

  • Is there any visible physical damage to the system?
    No

Did you hotplug the cable to different USB ports on the host?
Have you tried another cable? Did you also plug the USB cable on the target on the lower or upper USB port?

Yes, I tried another USB cable, the original is also working just fine since I can have access to the Aurix console and I was able to reflash the Aurix firmware with no issue as I mentioned in some messages before.

The USB cable is plugged into the lower USB port, called “Debug”

Do you also try a different USB port on the host system? On the target system, there is only one USB cable connected, right?