Having some issues with an AGX freezing intermittently. Front LED turns off, but the unit can be restarted via power on button. It is a little hard to troubleshoot, which is why I am here. All logs I have checked does not report anything particularly weird, except that logs seem badly terminated producing a lot of “^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@” at the end. I am unable to find anything particularly out of order after rebooting. CPU/GPU temperatures never reach very far above 60C, which I think should be okay.
Now, I am unsure what to do to troubleshoot this issue. I’ll be happy to supply logs, configs or other to get to the bottom of this. Thank you.
Not clean setup. A few things plugged in, a full ROS2 suite running inside a docker image on the Jetson while freezing. I cannot find any logs from inside the software that indicates an error, which is why I come here. Log files are checked from /var/log/syslog and other log files in the same folder. Plugging micro usb to jetson, then triggering freeze produces no output.
Unable to test with another jetson at the moment, but I will be able to soon. Perhaps I can get back to you with more details then. I am happy to receive further tips on how to troubleshoot this otherwise.
edit: Serial console just printed this right before crashing: [ 315.185774] nr_pdflush_threads exported in /proc is scheduled for removal
What gives?
full ROS2 suite running inside a docker image on the Jetson while freezing
We have no idea about this. Maybe you can try to use clean setup with similar application running first and see if you can reproduce issue.
syslog may not able to get the error when system crash or freeze. If this is devkit, use the uart console to monitor the log. Also, if CPU hangs over few minutes, the watchdog timer shall reboot the device.
Please share the full log instead of partial log. Your error log seems indicate 5 min to reproduce issue. Is it always this fast to reproduce issue?
Yes, I will not ask you to debug my software stuff. I much appreciate getting advice on how to approach an issue like this.
Full log here. (zerobin.net). The final log bit I pasted is more or less all I get except for boot stuff. Issue can reliably be triggered, but the precise time to failure varies.
It did reboot one time, but most of the time, the front LED goes black and the jetson does not power on.
Can we get more description about the status of the board when error happened?
You mention the “front LED goes black”. Are you saying the power indicator of the AGX goes off when this error happened?
If that is your case, it indicates the power is down. For such case, it is likely to be a hardware issue like unstable power.
If this is software problem, then most common behavior would be the log spews some panic log and the reboot.