My xavier suddenly died?

My Jetson was running on my desk just doing a few evolutionary trainings when I started noticing that random standard linux commands created segmentation faults. I tried to reboot but the xavier never came back to live (no USB, no Video Output and wasn’t showing up on the network.

What was still working was the serial debug interface and it looks like the I2C broke and the jetson is stuck in a boot loop now?

xavier_debug.log (34.8 KB)

...
0000.409] E> I2C: slave not found in slaves.
[0000.410] E> I2C: Could not write 0 bytes to slave: 0x00ae with repeat start true.
[0000.411] E> I2C_DEV: Failed to send register address 0x00000000.
[0000.412] E> I2C_DEV: Could not read 256 registers of size 1 from slave 0xae at 0x00000000 via instance 0.
[0000.413] E> eeprom: Failed to read I2C slave device
[0000.416] I> Failed to read CVB eeprom data @ AE
[0000.421] I> Retrying CVB eeprom read @ AC ...
[0000.460] E> HeaderMagic Invalid
[0000.460] I> ����⟟�����域�����域�����域�: execution failed
[0000.461] I> ����域�����域�����域�����域�: execution failed
[0000.461] E> Top caller module: LOADER, error module: LOADER, reason: 0x02, aux_info: 0x00
[0000.462] I> AB warm reset

Any ideas what I can do about it?

Thanks

Are you able to reflash your board in such situation?

Unfortunately the xavier is not detected by the SDK Manager anymore. My understanding is that the Xavier AGX can just be flashed through the SDK Manager and not by SD-Card?

Is there a way to reflash the boot loader through the JTAG interface on the carrier board?

After many reboots the debug log looks slightly different:

[0000.352] I> Active Boot chain : 0
[0000.397] I> Relocating BR-BCT
[0000.398]  > DEVICE_PROD: device prod is not initialized.
[0000.423] E> I2C: slave not found in slaves.
[0000.424] E> I2C: Could not write 0 bytes to slave: 0x00ae with repeat start true.
[0000.425] E> I2C_DEV: Failed to send register address 0x00000000.
[0000.426] E> I2C_DEV: Could not read 256 registers of size 1 from slave 0xae at 0x00000000 via instance 0.
[0000.427] E> eeprom: Failed to read I2C slave device
[0000.430] I> Failed to read CVB eeprom data @ AE
[0000.434] I> Retrying CVB eeprom read @ AC ...
[0000.463] E> cpubl: digest on binary did not match!!
[0000.463] I> �域�����域�����域�����域����: execution failed
[0000.463] I> �域�����域�����域�����域����: execution failed
[0000.464] E> Top caller module: LOADER, error module: LOADER, reason: 0x18, aux_info: 0x00
[0000.465] I> AB warm reset

Thanks!

Hi

Just want to confirm. Is this your first time flashing your board?
Do you know how to put board into recovery mode?

I just want to tell that “xavier is not detected by the host PC anymore” is a fatal problem which may indicate hardware defect

But I need to make sure your board is really in recovery mode first… if it is not, then of course your host won’t detect it.

Yes, I have flashed the board many times before but this time I cannot get into RMC mode: I am using the following procedure from the User Guide:

Put Developer Kit into Force Recovery Mode

The developer kit must be in Force Recovery Mode (RCM) to enable the installer to

transfer system software to the Jetson module.

  1. Connect the developer kit as described above. It must be powered off.

  2. Press and hold down the Force Recovery button.

  3. Press and hold down the Power button.

  4. Release both buttons

Thanks

And lsusb not able to detect the board?

Do you have other AGX Xavier board that can cross check? For example plug out the module from the issued device and put it to another “fine” carrier board.

I don’t have a spare device that I can use to cross check but a few days ago when I booted the device up something was detected by my host machine:

lsusb
Bus 001 Device 013: ID 0403:6011 Future Technology Devices International, Ltd FT4232H Quad HS USB-UART/FIFO IC

The SDK Manager detected “some” NVIDIA device but wasn’t sure what is was though you had the option to manually select the XAVIER (what I did).

Unfortunately the flashing of device did not succeed. It started the process and did something but ended up with an error

Here are the last lines from the debug interface:

[0000.095] I> Boot-device: SDMMC (instance: 3)                                               
[0000.111] I> sdmmc DDR50 mode                                                               
[0000.115] I> Boot chain mechanism: A/B                                                      
[0000.118] I> Current Boot-Chain Slot: 0                                                     
[0000.122] E> Error, current slot 0 fails: sr_bl: 0x4d14ef1, sr_br: 0x0                      
[0000.128] I> Reset to recovery mode                                                         


[0059.568] I> sdmmc bdev is already initialized                                 
[0059.594] I> Found 20 partitions in SDMMC_BOOT (instance 3)                    
[0059.607] W> Cannot find any partition table for 00010003                      
[0059.608]  > PARTITION_MANAGER: Failed to publish partition.                   
[0059.609] I> Recovery boot_type: 0                                             
[0059.609] I> Entering 3p server                                                
[0059.609] I> USB configuration success                                         
[0059.889] I> Populate storage info                                             
[0059.933] I> Erasing device 0: 3                                               
[0061.054] I> Writing device 0: 3.                                              
[0061.165] I> Found 20 partitions in SDMMC_BOOT (instance 3)                    
[0061.166] I> Erasing device 1: 3                                               
[0061.660] I> Writing device 1: 3.                                              
[0061.668] I> Writing device 1: 3.                                              
[0061.682] W> Cannot find any partition table for 00010003                      
[0061.682] E> NV3P_SERVER: Failed to initialize partition table from GPT.  

The entire log:
xavier_flash_to_emmc.log (21.8 KB)

The SDK Manager shows the following:

Idk what else I could do? Seems like the emmc is corrupted? Maybe?

If you flashed same host + same jetpack version before on this AGX and it was working then but fails now, then it sounds a hardware problem to me.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.