Jetson AGX and NX Xavier power and network reliability issues

Hello,

All of my jetson devices have had weird issues with power, and/or network connections.

I have multiple Jetson Xavier Developer Kits some are NX and others are AGX.

They all have the latest version of Jetpack installed.

Occasionally they will randomly, without warning, just shutdown or disconnect themselves from the network, thus requiring a manual power cycle to get working again.

I’ve noticed that they can go up to 147 days without this problem occurring, but when it does occur, it happens again multiple times within a short interval.

At first, i suspected it was my software, or the power supply, a bad network switch, or just a bad jetson.

However, over the years i’ve accumulated many jetsons, and put them in various locations, with different power and network connections. Unfortunately every single one of them has had this issue.

There is no discernible pattern to force this to occur, it just happens.

I’m not sure what logs to open up and post on here, but if you point me to them, ill post them and hopefully this can get resolved without sending in any hardware.

to further clarify:

This hanging/shutoff/disconnect phenomena has happened with every release of jetpack that i have installed.

This sometimes occurs under heavy load, but more often occurs when the jetsons are in a idle state.

There is no power fluctuation in the buildings, or other devices when this occurs on the jetsons.

The raspberry pi’s i have on the same networks, have never disconnected or randomly shutdown, and are usually connected to the same outlet and router as the jetsons.

I’ve seen this issue multiple times on the forums, but it is always closed without a real answer, or the commentators treat it as a one-off occurrence.

It is not possible for me to plug into the serial ports to capture any logs, as these devices are all located remotely from me.

I’ve only had this issue with developer kits, so it’s likely to be a fundamental design issue with the carrier boards, but considering, that it occurs across the AGX and NX i suspect it is actually a LONG overlooked kernel design bug from nvidia.

Hi,
For further investigation, we would need dmesg or replicate the issue. Please share the steps so that we can setup and reproduce it locally.

And do you use Jetpack 4.6.3 or 5.1?

I’ve been working with the Jetsons since the Xavier first released(Jetpack 4.1.1), and i upgrade when you guys release new versions, but i first started noticing this issue in JetPack 4.4 Developer Preview.

I’m not sure if it started in that release, or if that’s when i first noticed it.

I actually have two devices in separate locations that randomly shutdown today during an active ssh session.

Jetson AGX 1: jetpack 5.1 (22 days since last issue)

Jetson AGX 2: jetpack 4.6.1 (56 days since last issue) ← reset twice today

They are both still off, i need to go manually reset them in the morning, i can post the dmesg then.

I’m not sure what steps you can specifically do to reproduce it, but here’s a general representation of what im doing when it occurs.

Software:

  1. ssh into device

  2. tmux

  3. Deepstream peoplenet running with deepsort x7 cameras

  4. Sql

Hardware:

  1. No Monitor attached to device

  2. Pins 5 & 6 are shorted for Auto-Power-On. [8-pin Header (J508) ]

  3. Jetson connected to network via ethernet

Thermal (as reported by jtop/tegrastat):

55-70 C

Then after some length of time (potentially weeks), one of four things will happen.

  1. It will reboot, and successfully restart. (extremely rare) ← happened once earlier today

  2. It will reboot, and go into emergency mode, thus requiring a manual power cycle to work (most common) ← probably what recently happened today (still unsure until the morning)

  3. It will become unresponsive, and hang for an undetermined amount of time( minutes to hours ), but after sometime become responsive again (less common)

  4. It will disconnect from the network, and not be able find it, thus requiring a reboot to find/reconnect to the network. (less common)