Custom carrier board aways crashed

background: custorm carrier board, Jetpack4.3
question: I don’t see the error message of crash in syslog. Is there any way to capture the crash information?
crash log

opt@opt-desktop:~$ sudo last
[sudo] password for opt:
opt pts/1 192.168.1.143 Tue Mar 2 16:30 still logged in
opt :0 :0 Tue Mar 2 16:06 still logged in
reboot system boot 4.9.140-tegra Tue Mar 2 16:06 still running
opt pts/3 192.168.1.201 Tue Mar 2 15:46 - crash (00:20)
opt pts/1 192.168.1.201 Tue Mar 2 14:58 - 15:46 (00:47)
opt :0 :0 Tue Mar 2 14:52 - crash (01:13)
reboot system boot 4.9.140-tegra Tue Mar 2 14:52 still running
opt :0 :0 Tue Mar 2 13:55 - crash (00:57)
reboot system boot 4.9.140-tegra Tue Mar 2 13:55 still running
opt :0 :0 Tue Mar 2 12:46 - crash (01:08)
reboot system boot 4.9.140-tegra Tue Mar 2 12:46 still running
opt :0 :0 Tue Mar 2 09:58 - crash (02:48)
reboot system boot 4.9.140-tegra Tue Mar 2 09:58 still running
opt :0 :0 Mon Mar 1 17:02 - crash (16:56)
reboot system boot 4.9.140-tegra Mon Mar 1 17:01 still running
opt :0 :0 Mon Mar 1 14:09 - crash (02:52)
reboot system boot 4.9.140-tegra Mon Mar 1 14:09 still running
opt :0 :0 Mon Mar 1 12:00 - crash (02:08)
reboot system boot 4.9.140-tegra Mon Mar 1 12:00 still running
opt :0 :0 Mon Mar 1 11:18 - crash (00:41)
reboot system boot 4.9.140-tegra Mon Mar 1 11:18 still running
opt :0 :0 Mon Mar 1 09:24 - crash (01:54)
reboot system boot 4.9.140-tegra Mon Mar 1 09:24 still running
opt :0 :0 Mon Mar 1 09:13 - crash (00:11)
reboot system boot 4.9.140-tegra Mon Mar 1 09:13 still running

wtmp begins Mon Mar 1 09:13:23 2021
opt@opt-desktop:~$

system log
syslog (1.2 MB)

Please check under /var/crash for eventual crash reports.

Thank you very much , but /var/crash is empty, Is there another way?
image

hi wm18822827507:
check log command to find more info
: dmesg |grep error

I got this error too on my Jetson Nano B01. It has been crashed many times:

reboot   system boot  4.9.140-tegra    Tue Mar 16 11:30   still running
lamth      pts/4        10.8.0.22        Tue Mar 16 09:13 - crash  (02:17)
lamth      pts/3        10.68.4.54       Tue Mar 16 08:45 - crash  (02:44)
lamth      pts/2        10.68.4.85       Tue Mar 16 08:40 - crash  (02:49)
lamth      pts/1        10.68.4.85       Tue Mar 16 08:40 - crash  (02:49)
lamth      pts/0        10.68.4.85       Tue Mar 16 08:39 - crash  (02:50)
lamth      pts/0        10.8.0.22        Mon Mar 15 20:55 - 21:21  (00:25)

That is what I got from dmesg | grep error:

[ 8122.728693] pcieport 0000:00:02.0: AER: Corrected error received: id=0018
[ 8122.739599] pcieport 0000:00:02.0:   device [10de:0faf] error status/mask=00000001/00002000
[ 8332.651691] pcieport 0000:00:02.0: AER: Corrected error received: id=0018
[ 8332.662563] pcieport 0000:00:02.0:   device [10de:0faf] error status/mask=00000001/00002000
[ 8365.415856] pcieport 0000:00:02.0: AER: Corrected error received: id=0018
[ 8365.449780] pcieport 0000:00:02.0:   device [10de:0faf] error status/mask=00000001/00002000
[ 8837.338450] pcieport 0000:00:02.0: AER: Corrected error received: id=0018
[ 8837.373300] pcieport 0000:00:02.0:   device [10de:0faf] error status/mask=00000001/00002000

What does it mean? In my case, that error also cause another error: Read-only file system.
Help me

I don’t think you two have same error. Please file a new topic for your issue.

image

If your board has uart console, connect that and monitor the log until the error happens.

If it is software problem, then it will show error on uart.


serial_2021-03-23_13-42-41.log (232.5 KB)

How long is the duration between that monitoring start and error happens?

What application is running on board during that time?

This is more like sudden power loss on board. Is same issue happened on devkit?

1.About 1 hour
2.I focused on the second question,a lot of tests have been done,if I execute the following command, the machine will crash, after 1 hour.

opt@opt-desktop:~$sudo nvpmodel -m 0;
opt@opt-desktop:~$sudo /usr/bin/jetson_clocks;

2.I can’t test on devkit.

Please also monitoring the tegrastats result and see if temperature becomes high when error happens.

When it crashes, the temperature is normal

Please check if same application will cause crash on devkit too. If you don’t have devkit, then please try to get one.

ok, i’ll get one