Xavier AGX doesn't boot anymore

Please provide the following info (check/uncheck the boxes after creating this topic):
Software Version
DRIVE OS Linux 5.2.6
DRIVE OS Linux 5.2.0
DRIVE OS Linux 5.2.0 and DriveWorks 3.5
NVIDIA DRIVE™ Software 10.0 (Linux)
NVIDIA DRIVE™ Software 9.0 (Linux)
other DRIVE OS version
other

Target Operating System
Linux
QNX
other

Hardware Platform
NVIDIA DRIVE™ AGX Xavier DevKit (E3550)
NVIDIA DRIVE™ AGX Pegasus DevKit (E3550)
other

SDK Manager Version
1.6.1.8175
1.6.0.8170
other

Host Machine Version
native Ubuntu 18.04
other

Hello,

So this morning when I tried to turn on xavier board it seemed stuck, fans were low speed and no display. Looking at the aurix console I saw this error

The board has been powered off.                         
Low Voltage detected on KL30_Power                      
Forced Power On after 30sec timeout                     
System_PowerOn: Reading PG status: Timeout Error!
                                                 X1_XA_VDD_SOC_XA_VDD_DDR2_XA_VG
IOHWABS_PWRUP_SEQUENCE_ERROR : PowerOn sequence encountered errors... 
IOHWABS_PWRUP_SEQUENCE_ERROR : PowerOn sequence encountered errors... 
IOHWABS_PWRUP_SEQUENCE_ERROR : PowerOn sequence encountered errors... 
<repeats forever>

I then powered down and rechecked all connections, and after restart it seemed aurix was OK but Tegra A failed to boot

[0000.022] I> MB1 (prd-version: s_1.6.0.0-t194-41334769-fac5b753)
[0000.028] I> Boot-mode: Coldboot                       
[0000.031] I> Chip revision : A02 
[0000.034] I> Bootrom patch version : 15 (correctly patched)
[0000.039] I> ATE fuse revision : 0x200
[0000.042] I> Ram repair fuse : 0x0
[0000.045] I> Ram Code : 0x1
[0000.048] I> rst_source : 0x0
[0000.051] I> rst_level : 0x0
[0000.054] I> Boot-device: QSPI
[0000.057] I> Qspi flash params source = brbct
[0000.061] I> Qspi using bpmp-dma
[0000.064] I> Qspi clock source : pllp
[0000.067] W> DEVICE_PROD: device prod is not initialized.                      
[0000.072] I> QSPI Flash Size = 64 MB                                           
[0000.076] I> Qspi initialized successfully                                     
[0000.079] I> Active Boot chain : 0                                             
[0000.083] I> Boot-device: QSPI                                                 
[0000.085] I> Qspi flash params source = brbct                                  
[0000.091] E> MB1_PLATFORM_CONFIG: device prod data is empty in MB1 BCT.        
[0000.097] E> MB1_PLATFORM_CONFIG: Failed to initialize device prod.            
[0000.105] I> Temperature = 21000                                               
[0000.108] W> Skipping boost for clk: AON_CPU_NIC                               
[0000.112] W> Skipping boost for clk: CAN1                                      
[0000.116] W> Skipping boost for clk: CAN2                                      
[0000.120] I> Boot-device: QSPI                                                 
[0000.123] I> Qspi flash params source = mb1bct                                 
[0000.127] I> Qspi using bpmp-dma                                               
[0000.130] I> Qspi clock source : pllc_out0                                     
[0000.133] W> DEVICE_PROD: device prod is not initialized.                      
[0000.138] I> Qspi reinitialized                                                
[0000.144] I> Qspi flash params source = mb1bct                                 
[0000.149] I> ECC region[0]: Start:0x0, End:0x0                                 
[0000.153] I> ECC region[1]: Start:0x0, End:0x0                                 
[0000.157] I> ECC region[2]: Start:0x0, End:0x0                                 
[0000.161] I> ECC region[3]: Start:0x0, End:0x0                                 
[0000.165] I> ECC region[4]: Start:0x0, End:0x0                                 
[0000.169] I> Non-ECC region[0]: Start:0x80000000, End:0x880000000              
[0000.175] I> Non-ECC region[1]: Start:0x0, End:0x0                             
[0000.179] I> Non-ECC region[2]: Start:0x0, End:0x0                             
[0000.184] I> Non-ECC region[3]: Start:0x0, End:0x0                             
[0000.188] I> Non-ECC region[4]: Start:0x0, End:0x0                             
[0000.194] W> DEVICE_PROD: device prod is not initialized.                      
[0000.200] W> DEVICE_PROD: device prod is not initialized.                      
[0000.206] W> MB1_PLATFORM_CONFIG: Rail ID 7 not found in pmic rail config tabl.
[0000.213] E> FAILED: MEMIO rail config                                         
[0000.225] I> scrub mode: full dram                                             
[0000.230] I> Boot-device: QSPI                                                 
[0000.233] I> Qspi flash params source = mb1bct                                 
[0000.290] E> MB1_PLATFORM_CONFIG: Bootrom mmio i2c table is empty in MB1 BCT.  
[0000.299] W> MB1_PLATFORM_CONFIG: Rail ID 8 not found in pmic rail config tabl.
[0001.514] E> WP1.5 ACK pending                                                 
[0001.516] E> Error: 0                                                          
[0001.519] E> Task 79 failed (err: 0x32320006)                                  
[0001.523] E> Top caller module: CPUINIT, error module: CPUINIT, reason: 0x06, 0
[0001.531] I> MB1(s_1.6.0.0-t194-41334769-fac5b753) BIT boot status dump :      
01111111111111111111101111111111111111111110111111111011110011111111100101111010
[0001.560] I> Reset to recovery mode  

Tegra B booted OK though. I then tried to reflash board with fresh Drive 10.0 software but it encountered flash error during install. nvsdkm.tar.xz (1.8 MB)

Note, this is same board that had issues before,

Hi @gordon1zrra,

I found “file downloaded with wrong checksum.” error message in it as Sdkmanager downgrade to Drive 10.0 fails - #3 by SivaRamaKrishnaNV.
How did you solve it last time? Or can you remove /home/gordon/storage/nvidia_download/drive-t186ref-foundation-oss-src.run and let sdkmanager download it again?

Hi Vick,

I uninstalled Drive 10.0 rev1 and rev2 from sdkmanager, then changed the Target image folder to symlink’ed ~/nvidia_sdk. I did’t delete any files from the nvidia_download folder.

Can you try to remove /home/gordon/storage/nvidia_download/drive-t186ref-foundation-oss-src.run and then install again? Let’s see if it can solve “file downloaded with wrong checksum.” error.

Hi,
I tried removing the file but it didn’t re-download the file. I manually downloaded the file and copied it to the install dir, and it seemed the crc error disappeared, but the flash error still occurs.
nvsdkm_3.tar.xz (2.4 MB)

I saw the below messages in your logs.

11:02:12.516 - info: NV_FLASH_XAVIER_PDKFLASH_PARALLEL_COMP@DDPX: command options used = -b e3550b03-t194 -B qspi -x /dev/ttyUSB3 --updtcfga gos1-fs:dirname:/home/gordon/storage/nvidia_sdk/DRIVE_Software_10.0_Linux_OS_DRIVE_AGX_XAVIER/DRIVEOS/drive-t186ref-linux/targetfs_a --updtcfgb gos1-fs:dirname:/home/gordon/storage/nvidia_sdk/DRIVE_Software_10.0_Linux_OS_DRIVE_AGX_XAVIER/DRIVEOS/drive-t186ref-linux/targetfs_b -w^M
11:02:12.516 - info: NV_FLASH_XAVIER_PDKFLASH_PARALLEL_COMP@DDPX: ^M
11:02:12.516 - info: NV_FLASH_XAVIER_PDKFLASH_PARALLEL_COMP@DDPX: ------------ Stack Trace ------------^M
11:02:12.516 - info: NV_FLASH_XAVIER_PDKFLASH_PARALLEL_COMP@DDPX: stack frame 0 - 329 AbnormalTermination /home/gordon/nvidia_sdk/DRIVE_Software_10.0_Linux_OS_DRIVE_AGX_XAVIER/DRIVEOS/drive-t186ref-foundation/tools/host/flashtools/bootburn_t19x/bootburn_lib.sh^M
11:02:12.516 - info: NV_FLASH_XAVIER_PDKFLASH_PARALLEL_COMP@DDPX: stack frame 1 - 48 CheckUSBServiceInit /home/gordon/nvidia_sdk/DRIVE_Software_10.0_Linux_OS_DRIVE_AGX_XAVIER/DRIVEOS/drive-t186ref-foundation/tools/host/flashtools/bootburn_t19x/bootburn_adb.sh^M
11:02:12.516 - info: NV_FLASH_XAVIER_PDKFLASH_PARALLEL_COMP@DDPX: stack frame 2 - 2942 FlashImages /home/gordon/nvidia_sdk/DRIVE_Software_10.0_Linux_OS_DRIVE_AGX_XAVIER/DRIVEOS/drive-t186ref-foundation/tools/host/flashtools/bootburn_t19x/bootburn_lib.sh^M
11:02:12.516 - info: NV_FLASH_XAVIER_PDKFLASH_PARALLEL_COMP@DDPX: stack frame 3 - 404 source /home/gordon/nvidia_sdk/DRIVE_Software_10.0_Linux_OS_DRIVE_AGX_XAVIER/DRIVEOS/drive-t186ref-foundation/tools/host/flashtools/bootburn_t19x/bootburn_active.sh^M
11:02:12.517 - info: NV_FLASH_XAVIER_PDKFLASH_PARALLEL_COMP@DDPX: stack frame 4 - 1270 main /home/gordon/nvidia_sdk/DRIVE_Software_10.0_Linux_OS_DRIVE_AGX_XAVIER/DRIVEOS/drive-t186ref-foundation//tools/host/flashtools/bootburn_t19x/bootburn.sh^M
11:02:12.517 - info: NV_FLASH_XAVIER_PDKFLASH_PARALLEL_COMP@DDPX: -------------------------------------^M
11:02:12.517 - info: NV_FLASH_XAVIER_PDKFLASH_PARALLEL_COMP@DDPX: ^M
11:02:12.517 - info: NV_FLASH_XAVIER_PDKFLASH_PARALLEL_COMP@DDPX: Tool OutPut:^M
11:02:12.517 - info: NV_FLASH_XAVIER_PDKFLASH_PARALLEL_COMP@DDPX: Bus 001 Device 104: ID 0955:7100 NVidia Corp. Tegra Device^M
11:02:12.517 - info: NV_FLASH_XAVIER_PDKFLASH_PARALLEL_COMP@DDPX: Bus 001 Device 103: ID 0955:7019 NVidia Corp. ^M
11:02:12.517 - info: NV_FLASH_XAVIER_PDKFLASH_PARALLEL_COMP@DDPX: Tool OutPut to stderr:^M
11:02:12.517 - info: NV_FLASH_XAVIER_PDKFLASH_PARALLEL_COMP@DDPX: Bus 001 Device 104: ID 0955:7100 NVidia Corp. Tegra Device^M
11:02:12.517 - info: NV_FLASH_XAVIER_PDKFLASH_PARALLEL_COMP@DDPX: Bus 001 Device 103: ID 0955:7019 NVidia Corp. ^M
11:02:12.517 - info: NV_FLASH_XAVIER_PDKFLASH_PARALLEL_COMP@DDPX: error-adb-timeout^M

Please make sure you’re using a USB 2.0 A-to-A cable instead of 3.0.
If still have problems, please refer to the solutions in the below two topics.

Hi,

I was using a USB A to C adapter with the USB 2.0 cable, maybe that was problem? Unforunately, I was trying different things and deleted the ~/.nvsdkm directory, now I think I’m getting error during patch stage.
nvsdkm_4.tar.xz (737.9 KB)

Please use a USB 2.0 A-to-A cable that was verified okay.

Now it’s the same as the other post (Sdkmanager downgrade to Drive 10.0 fails - #5 by VickNV).

Hi,

I was using something like this with the usb A to A cable


From now on I’ll plug USB A directly to my laptop.

So to fix the patch error issue, I had to uninstall Drive 10.0 from the SDKmanager then on next flash attempt it rebuilt the OS images which seems to fix the patching error. After that the flash burn worked OK and I now have a working board.

I tried a few power cycles and didn’t see the original boot or flash errors, so I guess board is acting normally again. Time will tell.

Thanks for the help, once again sdkmanager issues made this way harder than it should have been.

Good to hear you solve the problem. Thanks.

Hi again,

So after letting board sit powered off overnight, this morning it got the boot error again. Tegra A doesn’t boot, tegra B is OK. The aurix console did not show any abnormal messages.

[0000.022] I> MB1 (prd-version: s_1.6.0.0-t194-41334769-fac5b753)
[0000.028] I> Boot-mode: Coldboot
[0000.031] I> Chip revision : A02 
[0000.034] I> Bootrom patch version : 15 (correctly patched)
[0000.039] I> ATE fuse revision : 0x200
[0000.042] I> Ram repair fuse : 0x0
[0000.045] I> Ram Code : 0x1
[0000.048] I> rst_source : 0x0
[0000.051] I> rst_level : 0x0
[0000.054] I> Boot-device: QSPI
[0000.057] I> Qspi flash params source = brbct
[0000.061] I> Qspi using bpmp-dma
[0000.064] I> Qspi clock source : pllp
[0000.067] W> DEVICE_PROD: device prod is not initialized.                      
[0000.072] I> QSPI Flash Size = 64 MB                                           
[0000.076] I> Qspi initialized successfully                                     
[0000.079] I> Active Boot chain : 0                                             
[0000.083] I> Boot-device: QSPI                                                 
[0000.085] I> Qspi flash params source = brbct                                  
[0000.091] E> MB1_PLATFORM_CONFIG: device prod data is empty in MB1 BCT.        
[0000.097] E> MB1_PLATFORM_CONFIG: Failed to initialize device prod.            
[0000.105] I> Temperature = 15500                                               
[0000.108] W> Skipping boost for clk: AON_CPU_NIC                               
[0000.112] W> Skipping boost for clk: CAN1                                      
[0000.116] W> Skipping boost for clk: CAN2                                      
[0000.120] I> Boot-device: QSPI                                                 
[0000.123] I> Qspi flash params source = mb1bct                                 
[0000.127] I> Qspi using bpmp-dma                                               
[0000.130] I> Qspi clock source : pllc_out0                                     
[0000.133] W> DEVICE_PROD: device prod is not initialized.                      
[0000.138] I> Qspi reinitialized                                                
[0000.144] I> Qspi flash params source = mb1bct                                 
[0000.149] I> ECC region[0]: Start:0x0, End:0x0                                 
[0000.153] I> ECC region[1]: Start:0x0, End:0x0                                 
[0000.157] I> ECC region[2]: Start:0x0, End:0x0                                 
[0000.161] I> ECC region[3]: Start:0x0, End:0x0                                 
[0000.165] I> ECC region[4]: Start:0x0, End:0x0                                 
[0000.169] I> Non-ECC region[0]: Start:0x80000000, End:0x880000000              
[0000.175] I> Non-ECC region[1]: Start:0x0, End:0x0                             
[0000.179] I> Non-ECC region[2]: Start:0x0, End:0x0                             
[0000.184] I> Non-ECC region[3]: Start:0x0, End:0x0                             
[0000.188] I> Non-ECC region[4]: Start:0x0, End:0x0                             
[0000.194] W> DEVICE_PROD: device prod is not initialized.                      
[0000.200] W> DEVICE_PROD: device prod is not initialized.                      
[0000.206] W> MB1_PLATFORM_CONFIG: Rail ID 7 not found in pmic rail config tabl.
[0000.213] E> FAILED: MEMIO rail config                                         
[0000.225] I> scrub mode: full dram                                             
[0000.230] I> Boot-device: QSPI                                                 
[0000.233] I> Qspi flash params source = mb1bct                                 
[0000.290] E> MB1_PLATFORM_CONFIG: Bootrom mmio i2c table is empty in MB1 BCT.  
[0000.299] W> MB1_PLATFORM_CONFIG: Rail ID 8 not found in pmic rail config tabl.
[0001.513] E> WP1.5 ACK pending                                                 
[0001.516] E> Error: 0                                                          
[0001.518] E> Task 79 failed (err: 0x32320006)                                  
[0001.522] E> Top caller module: CPUINIT, error module: CPUINIT, reason: 0x06, 0
[0001.530] I> MB1(s_1.6.0.0-t194-41334769-fac5b753) BIT boot status dump :      
01111111111111111111101111111111111111111110111111111011110011111111100101111010
[0001.559] I> Reset to recovery mode                                            

Is there anyway I can run some FLASH test to check the integrity of the non-volatile memory? Does tegra A, tegra B, and aurix have separate FLASH storage or are they shared?

Please also share boot messages of Xavier B.

Was there any HDMI monitor connected before booting?

Hi,

Here is boot log from Tegra B
tegra_b_boot_good.txt (106.7 KB)

And HDMI monitor was connect to both Tegra A and B prior to booting.

It looks same as Xavier A not responding and not able to flash. But it was fixed unknown there.

Was it really flashed successfully? Can we check the logs? Thanks.

Hi,

Here are logs.
nvsdkm5.tar.xz (1.5 MB)

I did try a few power cycles yesterday, it was definitely working and booting into ubuntu.

Yeah, it looks flashing successfully.

  • NV_FLASH_XAVIER_PDKFLASH_PARALLEL_COMP@DDPX: Installed - 00:42:09 -start: 10:04:02 GMT-0700 (Pacific Daylight Time) - end: 10:46:11 GMT-0700 (Pacific Daylight Time).

Please help check your Aurix firmware version. Thanks.

Looks consistent with before

shell> version                                                                  
Info: Executing cmd: version, argc: 0, args:                                    
SW Version: DRIVE-V5.1.6-E3550-EB-Aurix-With3LSS-ForHyperion-StepA-3.05.04      
Compilation date: Jun 25 2019, 14:25:36                                         
Command Executed                                                                
shell> 

Hi,

After reading Xavier A not responding and not able to flash I tried power cycling a few times. After about the 5-6 time, the Tegra A booted. Power cycles after that seems to always work.

Here is log of successful Tegra A boot.tegra_a_boot_good.txt (99.3 KB)

The only difference I see is at end of QSPI boot messages, in fail case there is

[0001.514] E> WP1.5 ACK pending                                                 
[0001.516] E> Error: 0                                                          
[0001.519] E> Task 79 failed (err: 0x32320006)  

and in good case it just prints

MB1 done