I noticed that the system will randomly restart,and there were no abnormalities in the debug log during the reboot.
orin-master login: nvidia^M^M^M
Password: ^M^M
Welcome to Ubuntu 20.04.6 LTS (GNU/Linux 5.10.120-rt70-tegra aarch64)^M^M
^M^M
* Documentation: https://help.ubuntu.com^M^M
* Management: https://landscape.canonical.com^M^M
* Support: https://ubuntu.com/advantage^M^M
^M^M
This system has been minimized by removing packages and content that are^M^M
not required on a system that users do not log into.^M^M
^M^M
To restore this content, you can run the 'unminimize' command.^M^M
^M^M
Expanded Security Maintenance for Applications is not enabled.^M^M
^M^M
18 updates can be applied immediately.^M^M
To see these additional updates run: apt list --upgradable^M^M
^M^M
60 additional security updates can be applied with ESM Apps.^M^M
Learn more about enabling ESM Apps service at https://ubuntu.com/esm^M^M
^M^M
^M^M
The list of available updates is more than a week old.^M^M
To check for new updates run: sudo apt update^M^M
Last login: ä¸<89> 10æ<9c><88> 25 11:54:13 CST 2023 from 10.27.87.242 on pts/0^M^M
nvidia@orin-master:~$ ^M^M
nvidia@orin-master:~$ ^M^M
nvidia@orin-master:~$ ^M^M
nvidia@orin-master:~$ start^M^M
-bash: start: command not found^M^M
nvidia@orin-master:~$ ^@ÿâ^M
[0000.062] I> MB1 (version: 1.2.0.0-t234-54845784-562369e5)^M
[0000.067] I> t234-A01-0-Silicon (0x12347) Prod^M
[0000.071] I> Boot-mode : Coldboot^M
[0000.075] I> Entry timestamp: 0x00000000^M
[0000.078] I> last_boot_error: 0x0^M
[0000.082] I> BR-BCT: preprod_dev_sign: 0^M
[0000.085] I> rst_source: 0x2, rst_level: 0x1^M
[0000.089] I> Task: SE error check^M
[0000.093] I> Task: Bootchain select WAR set^M
[0000.097] I> Task: Enable SLCG^M
[0000.099] I> Task: CRC check^M
[0000.102] I> Skip FUSE records CRC check as records_integrity fuse is not burned^M
[0000.109] I> Task: Initialize MB2 params^M
[0000.114] I> MB2-params @ 0x40060000^M
[0000.117] I> Task: Crypto init^M
[0000.120] I> Task: Perform MB1 KAT tests^M
[0000.124] I> Task: NVRNG health check^M
[0000.127] I> NVRNG: Health check success^M
[0000.131] I> Task: MSS Bandwidth limiter settings for iGPU clients^M
[0000.137] I> Task: Enabling and initialization of Bandwidth limiter^M
[0000.143] I> No request to configure MBWT settings for any PC!^M
[0000.149] I> Task: Secure debug controls^M
[0000.153] I> Task: strap war set^M
[0000.156] I> Task: Initialize SOC Therm^M
[0000.160] I> Task: Program NV master stream id^M
[0000.164] I> Task: Verify boot mode^M
[0000.170] I> Task: Alias fuses^M
[0000.173] W> FUSE_ALIAS: Fuse alias on production fused part is not supported.^M
I’d suggest trying if this can be also observed on a DevKit.
If it cannot be consistently re-produced, then there’s little we can do.
Or see if you can observe something after reboot:
I mean try with different combinations with module/carrier boards to see if it only happens on specific devices.
Or do something like this to enable more debug log in kernel:
If it cannot be consistently re-produced, and also no abnormal log can be observed, then we can really do nothing.
Sorry, perhaps I did not make that quite clear.
Not a specific machine will have problems.
On site, all 30 devices will experience issues,and can continuously re-produced, but cannot observe abnormal logs