One Xavier module (16GB RAM 30W)has been used for half a year, and the following failures occurred:
-
In May, the module failed to restart the system after 1 hour of operation, but after 1 hour of shutdown, the module could work for 1 hour again, and the failure occurred again.
-
The failure lasted until June, and the Xavier module could not start the system anyway. The serial port was printed as follows:
[0000.083] I> MB1 (prd-version: 1.1.0.0-t194-41334769-514a1108)
[0000.089] I> Boot-mode: L0 coldboot
[0000.092] I> Chip revision : A02
[0000.095] I> Bootrom patch version : 7 (incorrectly patched)
[0000.100] I> ATE fuse revision : 0x200
[0000.104] I> Ram repair fuse : 0x0
[0000.107] I> Ram Code : 0x0
[0000.109] I> rst_source : 0x0
[0000.112] I> rst_level : 0x0
[0000.115] E> Failed to verify PMC high threshold fault occurence. Fault reg: 0x2
[0000.122] E> Task 7 failed (err: 0x77770118)
[0000.126] E> Top caller module: 馃馃馃馃馃馃馃馃 error module: 馃馃馃馃馃馃馃馃 reason: 0x18, aux_info: 0x01
[0000.138] I> MB1(1.1.0.0-t194-41334769-514a1108) BIT boot status dump :
1111111100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
[0000.168] I> Reset to recovery mode
[0000.052] I> MB1 (prd-version: 1.1.0.0-t194-41334769-514a1108)
[0000.058] I> Boot-mode: L1 coldboot
[0000.061] I> Chip revision : A02
[0000.064] I> Bootrom patch version : 7 (incorrectly patched)
[0000.069] I> ATE fuse revision : 0x200
[0000.073] I> Ram repair fuse : 0x0
[0000.076] I> Ram Code : 0x0
[0000.078] I> rst_source : 0xa
[0000.081] I> rst_level : 0x1
[0000.084] E> Failed to verify PMC high threshold fault occurence. Fault reg: 0x2
[0000.091] E> Task 7 failed (err: 0x77770118)
[0000.095] E> Top caller module: 馃馃馃馃馃馃馃馃 error module: 馃馃馃馃馃馃馃馃 reason: 0x18, aux_info: 0x01
[0000.107] I> MB1(1.1.0.0-t194-41334769-514a1108) BIT boot status dump :
1111111100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
[0000.137] I> Reset to recovery mode
3.The Xavier mold was assembled on the bottom plate of the development board for the brush operation, which could not be completed in any case.Other Xavier modules are sure to brush properly.
Xavier module can enter recovery mode and PC can normally recognize NVIDIA device. However, Xavier module will be restarted all the time after entering the status of swiping machine.The following figure shows the printing information of the Ubuntu terminal:
Has anyone ever been in this situation?Could the engineers of NVIDIA help to check what the fault is and why it is caused?