This hanging/shutoff/disconnect phenomena has happened with every release of jetpack that i have installed.
This sometimes occurs under heavy load, but more often occurs when the jetsons are in a idle state.
There is no power fluctuation in the buildings, or other devices when this occurs on the jetsons.
The raspberry pi’s i have on the same networks, have never disconnected or randomly shutdown, and are usually connected to the same outlet and router as the jetsons.
I’ve seen this issue multiple times on the forums, but it is always closed without a real answer, or the commentators treat it as a one-off occurrence.
It is not possible for me to plug into the serial ports to capture any logs, as these devices are all located remotely from me.
I’ve only had this issue with developer kits, so it’s likely to be a fundamental design issue with the carrier boards, but considering, that it occurs across the AGX and NX i suspect it is actually a LONG overlooked kernel design bug from nvidia.
I’ve been working with the Jetsons since the Xavier first released(Jetpack 4.1.1), and i upgrade when you guys release new versions, but i first started noticing this issue in JetPack 4.4 Developer Preview.
I’m not sure if it started in that release, or if that’s when i first noticed it.
I actually have two devices in separate locations that randomly shutdown today during an active ssh session.
Jetson AGX 1: jetpack 5.1 (22 days since last issue)
Jetson AGX 2: jetpack 4.6.1 (56 days since last issue) ← reset twice today
They are both still off, i need to go manually reset them in the morning, i can post the dmesg then.
I’m not sure what steps you can specifically do to reproduce it, but here’s a general representation of what im doing when it occurs.
ssh into device
Deepstream peoplenet running with deepsort x7 cameras
No Monitor attached to device
Pins 5 & 6 are shorted for Auto-Power-On. [8-pin Header (J508) ]
Jetson connected to network via ethernet
Thermal (as reported by jtop/tegrastat):
Then after some length of time (potentially weeks), one of four things will happen.
It will reboot, and successfully restart. (extremely rare) ← happened once earlier today
It will reboot, and go into emergency mode, thus requiring a manual power cycle to work (most common) ← probably what recently happened today (still unsure until the morning)
It will become unresponsive, and hang for an undetermined amount of time( minutes to hours ), but after sometime become responsive again (less common)
It will disconnect from the network, and not be able find it, thus requiring a reboot to find/reconnect to the network. (less common)