Continual problems with Nvidia Jetson Orin Nanos after sudden reboots

Hi there,

We have eight Nvidia Jetson Orin Nanos deployed at a remote location. They are used as cameras. Recently we have tried to upgrade the cameras from just streaming/ recording, to also performing AI inference object detection with a small custom CNN. However, in running these new scripts, the Jetsons repeatedly suddenly reboot. The reasons for the rebooting are unclear (here is one such post regarding this: X264 and TensorRT sudden reboot (MJPG encoder not affected, but not fast enough) on Jetson Orin Nano - Jetson & Embedded Systems / Jetson Orin Nano - NVIDIA Developer Forums), however that is not what this post is about.

After rebooting, sometimes the Nvidia Jetson Orin Nanos incur different errors or states that were not present before running the script and the Jetson rebooting. Interestingly, these errors differ between cameras, and only occur sometimes. Why does this happen and how can we prevent it?

Here is an example of one such error.

I ran a new script which runs a TensorRT engine on live video feed from a CSI camera. I ran this script on multiple devices, and it caused three of them to reboot after running for a while. Two of the devices that rebooted were fine afterwards, however one of them is no longer able to run that same script, here is the error that shows:

$ python script.py

[05/22/2024-14:10:44] [TRT] [W] CUDA initialization failure with error: 100
Traceback (most recent call last):
  File "record_images_and_detect.py", line 307, in <module>
    main()
  File "record_images_and_detect.py", line 301, in main
    tensorrt_model = TensorRTInference(ENGINE_PATH)
  File "/home/fov/Desktop/FOVCamerasWebApp/jetson/tensorrt_inference.py", line 33, in __init__
    self.engine = self.load_engine(engine_path)
  File "/home/fov/Desktop/FOVCamerasWebApp/jetson/tensorrt_inference.py", line 40, in load_engine
    with open(engine_path, 'rb') as f, trt.Runtime(TRT_LOGGER) as runtime:
TypeError: pybind11::init(): factory function returned nullptr

This exact same script was able to run fine before rebooting, and now faces the above error after rebooting.

Another example is that before rebooting, and Jetson works fine, but then after a sudden rebooting, we will get this error after running sudo apt upgrade:

debconf: Delaying package configuration, since apt-utils is not installed.
dpkg: unrecoverable fatal error, aborting:
 loading files list file for package 'wireless-regdb': cannot open /var/lib/dpkg/info/wireless-regdb.list (Structure needs cleaning)
E: Sub-process /usr/bin/dpkg returned an error code (2)

For more context, the python script in question looks like this one: X264 and TensorRT sudden reboot (MJPG encoder not affected, but not fast enough) on Jetson Orin Nano - Jetson & Embedded Systems / Jetson Orin Nano - NVIDIA Developer Forums

Here is device information:

Why do these errors occur after, but not before, running a python script and the Jetson suddenly rebooting? Why do they only occur on some devices, but not all? How can we prevent them?

Thanks in advance and please let me know if there’s any more information I can provide.

This is duplicate of
X264 and TensorRT sudden reboot (MJPG encoder not affected, but not fast enough) on Jetson Orin Nano

Let’s continue in the topic thread.