Aurix MODMGR_ERROR preventing device boot

Software Version
DRIVE OS Linux 5.2.0
DRIVE OS Linux 5.2.0 and DriveWorks 3.5
NVIDIA DRIVE™ Software 10.0 (Linux)
NVIDIA DRIVE™ Software 9.0 (Linux)
other DRIVE OS version
other

Target Operating System
Linux
QNX
other

Hardware Platform
NVIDIA DRIVE™ AGX Xavier DevKit (E3550)
NVIDIA DRIVE™ AGX Pegasus DevKit (E3550)
other

SDK Manager Version
1.4.0.7363
other

Host Machine Version
native Ubuntu 18.04
other

Issue

We have a Pegasus unit that was working fine for a few weeks but recently the Xaviers have stopped responding. We could not communicate via ethernet, serial, or through external monitors. The Aurix was also not responsive over serial

I tried re-flashing the unit using the Nvidia SDK. The flash reached 99% but did not progress further. Eventually a dialog popped up saying Installation of 'Flash Xavier A+B in parallel' is taking longer than expected. Do you want to continue installing this package?. After selecting continue and waiting a while longer, I exited the process. Logs here: 010_flashing.log (104.6 KB)

After flashing, I was able to connect to the Aurix and saw a stream of the following error:
ERROR: PSM_ModMgr: MODMGR_ERROR

I have also attached a log of Aurix command responses to help diagnose the issue: 010_drive_post_flash_aurix.txt (6.4 KB)

Can you elucidate what may be wrong with the unit and how we may be able to repair it?

Thank you,
Osman

Marking as duplicate to PowerOn sequence error: Drive fails to boot. Please continue the discussion in old thread.

Thanks for looking at this. Can we re-open this ticket? It is not a duplicate of the other thread: this is a different Pegasus unit experiencing different error symptoms

Hi @oshawkat ,

I cannot tell if the installation succeeded or not. Could you share the log package for our checking? Thanks.
/home/oshawkat/Downloads/SDKM_logs_2021-02-22_13-35-41.zip

Thank you for the quick response. The full zip can be found here: SDKM_logs_2021-02-22_13-35-41.zip (877.9 KB)

I’m also about to try re-flashing the unit after deleting the ~/.nvsdkm/ folder (per the recommendation on the other thread). Will share the new log files as soon as they’re available

I still cannot find the message about flashing in it.

Running flash command: sudo -E /home/vyu/nvidia/nvidia_sdk/DRIVE_Software_10.0_Linux_OS_E3550/DRIVEOS/drive-t186ref-foundation//tools/host/flashtools/bootburn_t19x/bootburn.sh -b e3550b03-t194 -B qspi -x /dev/ttyUSB3 --updtcfga gos1-fs:dirname:/home/vyu/nvidia/nvidia_sdk/DRIVE_Software_10.0_Linux_OS_E3550/DRIVEOS/drive-t186ref-linux/targetfs_a --updtcfgb gos1-fs:dirname:/home/vyu/nvidia/nvidia_sdk/DRIVE_Software_10.0_Linux_OS_E3550/DRIVEOS/drive-t186ref-linux/targetfs_b -w
...
Flashing successful for given configuration!

If you want to reinstall cleanly, please issue below command before lauching sdkmanager to install. Thanks.

$ rm -rf ~/.nvsdkm ~/Downloads/nvsdkm_logs ~/nvidia/nvidia_sdk/DRIVE_Software_10.0_Linux_OS*

I reflashed the unit after issuing the commands you recommended for a clean reinstall. The flash stalled at 99% again and I let it run for 2+ hours (flashing other units typically takes ~30min). I did not see the two adp related popups that we typically see during the flashing process. After exiting the flash process, I manually power cycled the unit (waited 1 minute between power down and reboot) and saw the same original error message when I connected to the Aurix via serial (neither Xavier had any serial output)

Please let me know if there is any additional information that may help narrow down the issue

Logs: SDKM_logs_2021-02-23_14-36-28.zip (1.5 MB)

Your log shows NV_FLASH_XAVIER_PDKFLASH_PARALLEL_COMP process was stuck at “Aurix port /dev/ttyUSB3 found. Fetching board revision from InfoROM …” message for about 1 hour and 40 minutes .

12:47:36.457 - info: NV_FLASH_XAVIER_PDKFLASH_PARALLEL_COMP@DDPX: Aurix port /dev/ttyUSB3 found. Fetching board revision from InfoROM …
14:12:13.038 - info: install timeout for ‘Flash Xavier A+B in parallel’, [Continue]
14:28:57.722 - info: NV_FLASH_XAVIER_PDKFLASH_PARALLEL_COMP@DDPX - download is paused

Normally the message should be followed by messages as below (took 7 seconds in my log):

2020-03-04 22:33:36.053 - info: Aurix port /dev/ttyUSB3 found. Fetching board revision from InfoROM …
2020-03-04 22:33:43.506 - info: BOARD SKU : 940-63550-2200-011
2020-03-04 22:33:44.232 - info: Board PCB revision is b03
2020-03-04 22:33:44.233 - info: Executing bind cmd make -f Makefile.bind PCT=linux BOARD=e3550b03-t194a PCT_VARIANT=dev_nonrt
2020-03-04 22:33:50.155 - info: Executing bind cmd make -f Makefile.bind PCT=linux BOARD=e3550b03-t194b PCT_VARIANT=dev_nonrt
2020-03-04 22:33:55.074 - info: Bind partitions done!
2020-03-04 22:33:55.074 - info: Initializing SDK/PDK flasher for configuration
2020-03-04 22:33:55.074 - info: Board: e3550-t194ab
2020-03-04 22:33:55.074 - info: Board type: ES
2020-03-04 22:33:55.074 - info: Hypervisor config: linux
2020-03-04 22:33:55.074 - info:
2020-03-04 22:33:55.074 - info: Processor found in ttyUSB3
2020-03-04 22:34:03.239 - info: BOARD SKU : 940-63550-2200-011
2020-03-04 22:34:03.980 - info: Board PCB revision is b03
2020-03-04 22:34:03.981 - info: Running flash command: sudo -E /home/vyu/nvidia/nvidia_sdk/DRIVE_Software_10.0_Linux_OS_E3550/DRIVEOS/drive-t186ref-foundation//tools/host/flashtools/bootburn_t19x/bootburn.sh -b e3550b03-t194 -B qspi -x /dev/ttyUSB3 --updtcfga gos1-fs:dirname:/home/vyu/nvidia/nvidia_sdk/DRIVE_Software_10.0_Linux_OS_E3550/DRIVEOS/drive-t186ref-linux/targetfs_a --updtcfgb gos1-fs:dirname:/home/vyu/nvidia/nvidia_sdk/DRIVE_Software_10.0_Linux_OS_E3550/DRIVEOS/drive-t186ref-linux/targetfs_b -w

Have you/any colleagues ever successfully installed any release with sdkmanager before? Could you also check with them? Are you using USB 2.0 Type A to Type A cable? Could you try with another cable?

Please also refer to Setting Up the E3550 Platform for USB 2.0 Type A to Type A cable connection. Thanks.

Thanks for looking into this Vicky. Yes, I’ve been able to successfully flash multiple units in the past, including this particular unit (until it started to malfunction). We are using the USB 2.0 Type A to Type A cable included in the dev kit and plugging it into the Debug port (same cable used to successfully flash other Pegasus units).

What’s the working fine verion previously?
Have you ever flashed Aurix?
Could you help to double check if MCU switch (mentioned in below document) is correctly in “RUN” mode?

“20.Set the MCU Programming Switch to the RUN position and power cycle the platform.” in Flashing AURIX.

Thanks.

What have you done to the two units recently? Have they been set up on cars?

Hi Vicky. Yes, the two units had be set up on low-speed vehicles. The one with the PowerOn error was running for a few months while the one referenced in this thread was more recent. Both had been flashed with NVIDIA DRIVE Software 10.0.

I haven’t flashed the Aurix explicitly except as part of the larger flashing process. No additional/custom software was being run on it

I checked the MCU Programming Switch in our initial debugging and both are confirmed to be in the RUN position. Checked again and it is still in RUN

I checked “010_drive_post_flash_aurix.txt” you provided in your first post and found the suspicious message.

INFO: PSM_ModMgr: PSM_PwrCtrl_SysPowerOn Reading PG status Timeout Error!

I may need your help to run “showvoltages” in aurix console to check if any power issue.
Is “showvoltages” command available in your aurix console? If yes, could you help to provide the status by running it? If no, please refer to below post to flash IFW firmware and then get the status.

Thanks.

Hello Vick,

The showvoltages command is not available on our Aurix. We are unable to flash the IFW firmware as neither Xaviers A or B are running

Is there another method to run the command or flash the Aurix? Are there other debug steps we can take to help diagnose the issue?

Please refer to Flashing AURIX to flash with memtool on windows host. Thanks.

Important: In any case, do NOT click erase button to erase the UCB area. Once UCB is erased, the chip is dead.

Thank you Vick. I have flashed the Aurix from the two hex loads you specified. I am using Memtool 4.9 (from the Flashing Aurix instructions) so was unable to flash the UCB. The Aurix is no longer showing the ERROR: PSM_ModMgr: MODMGR_ERROR error from before.

Below is the output of showvoltages:

INPUTS:
  KL30_VBAT = 14.101 V
  KL30_POWER = 14.101 V
  KL15_POWER = 13.983 V
AURIX:
  AURIX_1V3 = 1.253 V
  AURIX_3V3 = 3.300 V
  AURIX_5V = 4.996 V
  CAN1_5V = 5.009 V
  CAN2_5V = 4.993 V
  FR_5V = 5.011 V
  HRNS_GPI = 0.398 V
SYSTEM:
  VBAT_SYS = 13.940 V
  VBATSYS_ISENSE = 1.847 A
  SYS_5V = 4.999 V
  SYS_0V85 = 0.837 V
  SYS_0V92_1 = 0.969 V
  SYS_0V92_2 = 0.001 V
  SYS_1V0 = 0.981 V
  SYS_1V1_2 = 0.001 V
  SYS_1V2 = 0.167 V
  SYS_1V5_2 = 0.000 V
  SYS_1V8_1 = 0.416 V
  SYS_1V8_2 = 0.369 V
  SYS_2V1 = 0.001 V
  SYS_2V5 = 2.528 V
  SYS_3V3_1 = 0.522 V
  SYS_3V3_2 = 0.042 V
  CE_PREREG = 0.005 V
Tegra A:
  VBAT_TEG = 14.064 V
  VBATTEG_ISENSE = 1.385 A
  XA_PREREG = 13.889 V
  XA_5V = 5.114 V
  XA_5V_SW = 5.112 V
  XA_VDD_1V0 = 1.001 V
  XA_VDD_1V8_AO = 1.797 V
  XA_VDD_1V8_HS = 1.792 V
  XA_VDD_1V8_LS = 1.807 V
  XA_VDD_CPU = 0.002 V
  XA_VDD_CV = 0.001 V
  XA_VDD_DDR2 = 1.109 V
  XA_VDD_DDRQ = 0.606 V
  XA_VDD_DDR_1V1 = 1.157 V
  XA_VDD_GPU = 0.000 V
  XA_VDD_SOC = 0.848 V
  X1_DGPU_THERM_ALERT_N = 1.816 V
  VBAT_SXMA = 0.014 V
  VBATSXMA_ISENSE = 0.000 A
  SXMA_PREREG = 0.011 V
Tegra B:
  XB_PREREG = 13.955 V
  XB_5V = 5.114 V
  XB_5V_SW = 5.112 V
  XB_VDD_1V0 = 1.001 V
  XB_VDD_1V8_AO = 1.802 V
  XB_VDD_1V8_HS = 1.799 V
  XB_VDD_1V8_LS = 1.813 V
  XB_VDD_CPU = 0.003 V
  XB_VDD_CV = 0.000 V
  XB_VDD_DDR2 = 1.115 V
  XB_VDD_DDRQ = 0.606 V
  XB_VDD_DDR_1V1 = 1.158 V
  XB_VDD_GPU = 0.002 V
  XB_VDD_SOC = 0.859 V
  X2_DGPU_THERM_ALERT_N = 1.810 V
  VBAT_SXMB = 0.007 V
  VBATSXMB_ISENSE = 0.000 A
  SXMB_PREREG = 0.017 V
CVM:
  P1_PREREG = 14.009 V
  P1_5V = 5.096 V

This log also includes some additional status messages (eg status, pgoodstatus) in case you find it helpful: 010_post_aurix_flash.txt (7.1 KB)

I am attempting a clean flashing of the Xaviers now and will update with those results when available

A clean flashing of the Xaviers failed, this time generating a different error message. Please find logs attached: SDKM_logs_2021-02-26_01-37-54.zip (1.5 MB)

Hi @oshawkat ,

Thanks for providing all the information. We will check them and get back to you soon.

For the log, I noticed you have ever installed DRIVE OS 5.2.0 onto the system.
May I know if you successfully installed and was able to access XavierA/B/Aurix via UART then? Thanks.

I had installed DRIVE OS 5.2.0 on another unit in the past but found that we lost the ability to connect to external monitors so reverted back to NVIDIA DRIVE™ Software 10.0 / DRIVE OS 5.1.6.

I will try installing DRIVE OS 5.2.0 on this unit to see if it resolves the issue