Xavier failed to boot, unable to log in to the desktop

The xavier module is on our carrier board, and the system fails to boot after power-on and cannot log in to the desktop, and the system restarts cyclically. After updating the image file, the same problem occurred again after a period of time.
The picture above shows the print information of the Xavier debugging interface when it is started. During normal startup, Xaiver will enter the system desktop after outpu “nvidia-desktop login”. When the startup fails, after the login information, Xavier will print some other information, such as “stall on CPU” and “watchdog”.
The system restarts automatically after a certain period of time, and then prints the above error message again.
Attached is the information printed when Xavier fails to boot.

How many AGX modules do you have on your side? Does every of them hit such issue with same setup?

If you just plug out the module and set it on devkit, will you hit this issue?

There are 50 modules, and the system management MCU is designed on the carrier board. When the MCU detects that Xavier fails to start, it is powered on again, but some modules always have the above error, even if it is powered on and restarted multiple times, it cannot be successful.
We tried cross-testing, swapping the module carrier boards with normal startup and abnormal startup. Modules with abnormal startup also failed on other carrier boards.

Could you put those problematic modules back to devkit, flash it with sdkmanager and tell if it would still hit issue?

There is currently no devkit available, and I will reply you as soon as possible after the test.

The problematic modules have been tested in devkit and no problems have been found for the time being.

Then this looks like hardware design issue.

Please review hardware design first.

