Random reboots seem to be quite the standard on the AGX.
I have noticed that having a ping to the gateway 192.168.0.1 (or similar) running in the background can keep the AGX running with up times of several months. As far as I can see there is some bug in either the network drivers or the networking hardware. Perhaps even power management related. I’ve tried to find the cause but only found this work-around. My suspicion for the network was triggered by intermittent network pauses. In all after network issues a reboot would be imminent. It’s like some interrupts don’t get noticed but the ping will cause the network stack to be nudged to keep processing pending packets.
There have been a multitude of reports on reboots and i have read many but there seems to be no knowledge of what causes it nor how to solve it. I guess I have learned to live with it. (Won’t be using any AGX in production anywhere here due to this issue.) For desktop use and development use it’s fine. Reboot is fast. Perhaps a future iteration with a newer processor will be rock stable, who knows. I have 2 jetson AGX systems for a full year now in testing and both have the same issue. One runs stock Ubuntu 18.04.5 LTS the other runs Debian Sid.
[ 95.531425]  el1_irq+0xe8/0x194
[ 95.531467]  nf_conntrack_in+0x100/0x940 [nf_conntrack]
[ 95.531480]  ipv4_conntrack_in+0x30/0x40 [nf_conntrack_ipv4]
Shows clearly that this is networking IRQ related and I have not read any sensible explanation anywhere in these forums nor fixes. I checked the changelogs for the NVidia kernels but can’t find anything that appears to be addressing it.
The only thing i recently saw was that they disable reboots on hung tasks in newer kernels… sounds more like a quick way to reduce reboots than a fix for the cause. (forgive my cynicism)
I hope the upcoming newer kernel will have a fix. The 5.x kernels are not yet available for testing but i hope to see them soon.