TensorRT engine creation trigger reboot on Orin AGX JP5EA

c.schmollgruber · April 6, 2022, 8:22am

Hi

We received our Jetson Orin and could test our SDK (ZED SDK). We manage to update our SDK to make it run using JP5 EA and Orin. It’s running without issues on Xavier AGX (JP5 EA).

However, we’re encountering a quite major issue with Orin, when using our object detection module, during the Tensorrt engine creation the Orin systematically freezes and reboots.

We tested to power it with the default power supply, the Xavier AGX one (barrel jack 90W), optimizing the model in MaxN, 50W, 30W, and 15W and the results are always similar. Some models are working such as skeleton (fast and accurate), but others such as Neural model and Object detections ones are always triggering this reboot when using the ZED SDK.
tegrastats doesn’t report any alarming temperature, power consumption, or RAM usage even just before shutdown.

To try to get around this we used trtexec to create the model inference engine for our neural model (in onnx), to our surprise, it worked. We can use the engine to run the inference in the SDK. There shouldn’t be any major difference between the 2 implementations or parameters set.

Here’s an early build of the ZED SDK installer link to reproduce the issue https://stereolabs.sfo2.digitaloceanspaces.com/zedsdk/beta/ZED_SDK_Tegra_L4T34.0_v3.7.1.run, you can trigger a model optimization even without a ZED by running /usr/local/zed/tools/ZED_Diagnostic -nrlo for instance. It will download and optimize the Neural disparity model.

Let us know how we can troubleshoot this. Thanks.

AastaLLL · April 7, 2022, 2:38am

Hi,

It looks like this issue is model specific.
Would you mind sharing the object detection ONNX model with us?

Since our JetPack5.0 DP is released soon, would you mind trying if the same issue occurs on the new release later this week?

Thanks.

adujardin · April 7, 2022, 2:56pm

Hi,

I’m the engineer who worked on this, we’ll test the JP5 DP for sure when it’s available.

The object detection models are not ONNX, but directly using the API. It’s derived from yolov5 and therefore pretty close to this tensorrtx/yolov5 at master · wang-xinyu/tensorrtx · GitHub.
However, I tested this specific implementation and it’s working as expected standalone.

So the issue remains only when using our SDK (again with either Neural depth or object detections, especially bigger models), we either have a specific set of params or conditions that trigger this. One thing we’ve noted is that these models tend to have a much longer engine creation time than the other working model, not sure if it’s a sign of anything.

I encourage you to test the SDK to see if you can spot anything out of the ordinary. We may have done something wrong and I would expect a crash or segfault, but not a hardware reboot from using a library (although I understand this is an early access software).

Thanks

AastaLLL · April 8, 2022, 3:07am

Hi,

Thanks for your feedback.

We want to reproduce this issue internally to get more information about the reboot.
Will let you know the following later.

Thanks.

AastaLLL · April 8, 2022, 4:02am

Hi,

We try to reproduce this issue with JetPack 5.0 DP but meet the following error:

...
Requirement already satisfied: numpy in /usr/lib/python3/dist-packages (1.17.4)
-> Downloading to '/tmp/selfgz43848'
Unsupported l4t/jetpack version
...

It looks like the library doesn’t support r34.1 yet.
Would you mind adding the support so we can run it on the JetPack 5.0 DP?

Please let us know if anything is missed.
Thanks.

adujardin · April 8, 2022, 8:05pm

Hi

The r34.1 seems to have solved our issues, the engine creation no longer fails in our preliminary testing and the hw decoder issue we had (not mentioned in this thread) is also gone.

Here’s a compatible installer r34.1 installer (early build) if needed

Thanks!

AastaLLL · April 11, 2022, 1:52am

Thanks for the feedback.
Good to know it works on r34.1 now.

system · May 4, 2022, 5:26am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.