Jetson AGX Xavier module unable to boot after using for some time

One Xavier module (16GB RAM 30W)has been used for half a year, and the following failures occurred:

  1. In May, the module failed to restart the system after 1 hour of operation, but after 1 hour of shutdown, the module could work for 1 hour again, and the failure occurred again.

  2. The failure lasted until June, and the Xavier module could not start the system anyway. The serial port was printed as follows:

[0000.083] I> MB1 (prd-version: 1.1.0.0-t194-41334769-514a1108)
[0000.089] I> Boot-mode: L0 coldboot
[0000.092] I> Chip revision : A02
[0000.095] I> Bootrom patch version : 7 (incorrectly patched)
[0000.100] I> ATE fuse revision : 0x200
[0000.104] I> Ram repair fuse : 0x0
[0000.107] I> Ram Code : 0x0
[0000.109] I> rst_source : 0x0
[0000.112] I> rst_level : 0x0
[0000.115] E> Failed to verify PMC high threshold fault occurence. Fault reg: 0x2
[0000.122] E> Task 7 failed (err: 0x77770118)
[0000.126] E> Top caller module: 馃馃馃馃馃馃馃馃 error module: 馃馃馃馃馃馃馃馃 reason: 0x18, aux_info: 0x01
[0000.138] I> MB1(1.1.0.0-t194-41334769-514a1108) BIT boot status dump :
1111111100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
[0000.168] I> Reset to recovery mode

[0000.052] I> MB1 (prd-version: 1.1.0.0-t194-41334769-514a1108)
[0000.058] I> Boot-mode: L1 coldboot
[0000.061] I> Chip revision : A02
[0000.064] I> Bootrom patch version : 7 (incorrectly patched)
[0000.069] I> ATE fuse revision : 0x200
[0000.073] I> Ram repair fuse : 0x0
[0000.076] I> Ram Code : 0x0
[0000.078] I> rst_source : 0xa
[0000.081] I> rst_level : 0x1
[0000.084] E> Failed to verify PMC high threshold fault occurence. Fault reg: 0x2
[0000.091] E> Task 7 failed (err: 0x77770118)
[0000.095] E> Top caller module: 馃馃馃馃馃馃馃馃 error module: 馃馃馃馃馃馃馃馃 reason: 0x18, aux_info: 0x01
[0000.107] I> MB1(1.1.0.0-t194-41334769-514a1108) BIT boot status dump :
1111111100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
[0000.137] I> Reset to recovery mode

3.The Xavier mold was assembled on the bottom plate of the development board for the brush operation, which could not be completed in any case.Other Xavier modules are sure to brush properly.

Xavier module can enter recovery mode and PC can normally recognize NVIDIA device. However, Xavier module will be restarted all the time after entering the status of swiping machine.The following figure shows the printing information of the Ubuntu terminal:

Has anyone ever been in this situation?Could the engineers of NVIDIA help to check what the fault is and why it is caused?

Hi,

The failure lasted until June, and the Xavier module could not start the system anyway. The serial port was printed as follows:

Is this the log during “flash” or a log during “boot up/reboot”?

It can happen in either case

Hi,

Do you mean your system is going to recovery mode when you press the power button?

According to your log posted above, it is entering recovery mode.

If that is your case, could you move this module to other carrier board and see if you could still hit this kind of issue?
If yes, then this device has hardware defect.

Hi,

When I pressed the power button, the system repeatedly printed the log and did not enter the Recovery mode, because My USB did not recognize the device.

If the module enters the recovery mode normally, the serial port does not print out.

The other carrier boards are the same.Change Xavier module to work properly.

What will be your flash error from UART? According to you, that previous log is from boot up,right?

  1. When I boot up the system, print the previous log

2.In the recovery model, UART is loss-free at the beginning.When I type sudo ./flash.sh jatson-xavier mmcblk0p1 in the terminal, uart prints the previous log again, and the latter log is printed by the ubuntu terminal.

OK. Then I guess this device has some defect. Please try to RMA it.