I’ve built a machine that runs jetson-inference on an Xavier NX (2 cameras, previously USB webcams, now IMX219 picam2’s on CSI connectors).
This device operates on agricultural machinery in bumpy and uneven terrain. We’ve been having some trouble with these bumps and vibrations causing timeout issues in the gstreamer pipeline for both of these cameras, they can occur between 5 and 60 minutes apart. (G limits for the module and carrier board are never exceeded).
If anyone has ever dealt with this issue before or has any ideas/suggestions/advice on how to try eliminate these issues, I’d greatly appreciate it.
(Posting for future devs who have the misfortune of encountering this issue)
Hi JerryChang,
After some extensive testing, I’ve discovered the root cause of the issue.
I checked into the kernel logs and found
[ 900.712480] [RCE] ERROR: camera-ip/vi5/vi5.c:745 [vi5_handle_eof] "General error queue is out of sync with frame queue. ts=924327378240 sof_ts=924327443424 gerror_code=2 gerror_data=a2 notify_bits=0"
which relates to a mismatch between the frame queue and general error queue in the video interface driver. Poking around even further I found approx. 98 open file descriptors for /dev/video0 and /dev/video1 when there should only be one for each device.
I looked into the nvargus-daemon logs/on forums and found it crashes if too many open file descriptors pile up. Using sudo systemctl restart nvargus-daemon I was able to determine that after several consecutive camera timeouts and program reboots (automated by a bash script), nvargus-daemon crashes but can be restarted with the above command.
This device uses mechanical relays to switch 12v solenoids which are located reasonably close to the jetson, and a 12v stator a bit further away. My leading hypothesis is that electromagnetic interference is causing a dropped frame from the camera, and nvargus crashes as a result, and cannot recover the pipeline (also something I have seen expressed online, this desperately needs addressing), meaning the camera cannot close properly, leaving the previous file descriptor open. The interference point could either be in the camera device, CSI cable, HDMI adapter or the Xavier itself.
We are continuing to test with MOSFETs to switch the 12v solenoids to minimise EM interference, and potentially implement Faraday shielding on the affected components. If anyone reading this is facing the same problems please reach out and I’ll try my best to help.
we’ve RCE firmware update to address this failure recently.
may I also confirm which Jetpack release version you’re working with?
you may check release tag, i.e. $ cat /etc/nv_tegra_release for confirmation.