Thank you in advance for any help.
Brief description of the situation:
We have a number of Jetson TX2 devices working in the field and some of those are presenting instability as they are rebooting every hour more or less.
Some facts we’ve learnt so far:
Reboots are happening in a somehow predictable cycle (every 1:15 or 1:45 hours). We have cron jobs running at 15 minutes intervals.
Reboots seem to be related to some I/O activity in an external drive we have for storage, if we disable that feature the reboots cease.
Some devices work ok even with that external drive I/O, some others are heavily affected, all devices are in the same facility, all within 200mts range.
They are operating in sub-zero ( C ) temperatures most of the time.
The reboots seem to be sudden as there is no shutdown signature seen in the syslog. We’ve made planned reboots and the shutdown sequence is present, however when the node reboots there is no sign of an orderly shutdown.
Also, normal manually triggered reboots take longer than unexpected reboots (maybe for the absence of the shutdown process).
There are 2 power units, one is 12v and the other 24v (this last for other devices). Both have 120W capacity, that seems to be ok.
The problem seems to be controllable from the software, depending on having engaged or not that file operations to the external hard drive.
$ head -1 /etc/nv_tegra_release
R32 (release), REVISION: 3.1, GCID: 18186506, BOARD: t186ref, EABI: aarch64, DATE: Tue Dec 10 07:03:07 UTC 2019
$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 18.04.4 LTS
Errors in syslogs that caught our attention on affected nodes:
Any help would be greatly appreciated.