Hey, I’m relatively new to Jetson / Linux and am experiencing random crashes that I can’t wrap my head around. I’ll do my best to describe the situation, but let me know if you need any more information:
Jetson: Orin NX 8GB (Seeed Studio J4011)
Connected device: USB 2.0 Camera (AR0234 sensor)
I am using the pre-installed Jetpack 5.1.1 (L4T 35.3.1)
I am using it to perform object detection on camera video feed using Tensorrt (yolov8 model) in python. At the same time I’m running a python opcua server and Flask webserver to show a web page with live results of the object detection. If necessary I can provide more details on specifics, but I do not believe my software is directly responsible for the crashing. There are no logs to indicate anything is wrong in the scripts when the crash happens, and it seems like the software still runs for a short period after what I think might be the cause of the crash - see in the logs below.
The processing leaves the CPU cores at around 50-70% usage, GPU usage between 40-80%. Memory usage is around 4-5GB, no swap memory used and plenty of space on the disk. Web server requires around 10 Mbps to stream video to another machine. Temperatures are between 70-80 degrees. Attached are two screenshots of JTOP/HTOP (through SSH) when the crash occurs:
As you can see, the processing is not exactly light, but at the same time nothing there indicates to me that it’s crashing due to over-working it.
I’ve attached the relevant parts of the syslogs for the two crashes from the screenshots above. It seems to me that Linux is (re)booting, but I’m not sure what that means or what’s causing it, or if this is the issue. There’s something weird about the timestamps as well, you can see that it jumps a bit back in time at the “systemd-modules-load: Inserted module ‘nvmap’” entry both times. I write the current timestamp to the video feed shown in the web server, and the time stamp of the last frame received is closer to the timestamps before the jump. In log 1 the last “cpufreq” entry is ~10 seconds before the last frame (15:03:34), all the entries after around ~30 seconds before. In log 2 the last “cpufreq” entry is around 40 seconds before the last frame (18:31:24), but the entries below a whole ~12 minutes before.
JTOP/HTOP is updating up to the last timestamp in the video feed. After this point I’m not able to connect to the machine through SSH anymore, and have to power cycle it to get back.
I’d appreciate any help in diagnosing the issue and what I can do to fix it!
Alternatively, is there a way to automatically reboot the machine when this happens?