Jetson Orin Nano - Unknown embedded device detected

Hi,
we did some evaluations in the last weeks using the Orin Devkit and the different emulations of Orin NX and Orin Nano. Our workflow is that we build a TensorRT engine from an ONNX and then benchmark the engine.

This worked fine for:

  • Devkit (AGX 64GB)
  • NX 16GB
  • Nano 8GB

On the Nano 4GB however, we experienced the following warnings when building with trtexec:

[11/03/2022-12:01:57] [W] [TRT] Unknown embedded device detected. Using 2779MiB as the allocation cap for memory on embedded devices.

It seems that the build freezes after some time.

I want to know if you can reproduce the error and if this is maybe related to the known issues I saw in the Orin Nano Emulation Overlay.

Thanks

Hi,

The Orin Nano 4GB variant is added in TensorRT 8.5 (not available yet).
So you will get the warning when running TensorRT 8.4 (JetPack 5.0.2) on the Orin 4GB board.

But we provide a fallback flow for the non-listed device and it is expected TensorRT can work on these boards.
Could you check the device status with tegrastats to see if the freeze is caused by the memory shortage?

$ sudo tegrastats

We are also trying to reproduce this issue internally.
Will share more information with you later.

Thanks.

Hi,
thanks for the quick response.
The fallback flow you mentioned: Can I expect nearly the same performance with TRT 8.4.1 + fallback flow as in the future with 8.5? Or do you expect it to deliver significantly more performance (>10%) with 8.5?

In terms of memory shortage, you are correct. After monitoring tegrastats, I noticed, that the swap was running full while building the engine. I was able to build the engine either by increasing the swap to 4GB or setting the memPoolSize to 2GB.
This brings me to another question: Let’s say I set the MemPoolSize to 2GB for the tactics: How much RAM in total will the engine building need in total? Does this completely depend on the model I want to build?

Thanks

Hi,

Yes, the performance should be similar.
And the MemPoolSize should depend on the model you use.

It’s expected that the building processing should work or rise an OOM error instead of a system freeze.
Could you share which model you want to benchmark?
We want to reproduce this issue in our environment as well.

Thanks.

Hi,

follow-up question to the memPoolSize: Does the

Using 2779MiB as the allocation cap for memory

mean, that tensorRT cannot use more memory, even if I try to overwrite the memPoolSize?
Because I tried another segmentation model, where the memPoolSize wasn’t enough to implement a node, and I couldn’t get tensorRT to use more than the ~2800MB (with 400-500MB RAM still free).
If that is the case: Is there a way to overwrite this memory cap, or do I have to wait for 8.5 to become available?

Regarding the model which caused the freeze:
Unfortunately, I can not share the exact model I used, but I exported the same model with the default coco pre-trained weights. YOLO4 COCO weights ONNX. It is a YOLO4 with the default architecture.

I tried to build with

/usr/src/tensorrt/bin/trtexec --onnx=yolov4_1_3_704_704_static.onnx

The swap was at the default 2GB.

Thanks

Hi,

Thanks for sharing.
We are going to check the memPoolSize issue and will share more information with you later.

In general, TensorRT will try to allocate all the available GPU memory to deploy a faster algorithm.
(usually faster implementation requires more memory)
This also can be controlled via the --workspace configuration.

Thanks.

Hi,

Do you get the model to work by setting the --memPoolSize=workspace:2048?

We try the model you shared and limit memPoolSize to 2G.
TensorRT fails with killed due to insufficient memory.

Thanks.

Hi,

I repeated the problematic engine builds yesterday to confirm this (some of them with my original model and some with the model I shared). To my surprise, all the engine builds succeeded.

I tried the following Swap/Workspace/Precision Combinations:

  • Swap: 4GB / FP32 --memPoolSize=workspace:2048
  • Swap: 4GB / FP32
  • Swap: 4GB / FP16 --memPoolSize=workspace:2048
  • Swap: 4GB / FP16
  • Swap: 2GB / FP16 --memPoolSize=workspace:2048
  • Swap: 2GB / FP16
  • Swap: 2GB / FP32 --memPoolSize=workspace:2048
  • Swap: 2GB / FP32

Nothing has changed in the setup since the problem occurred last time.
I work over SSH, with a background RAM usage of ~500MB at the beginning of the engine building.
The freeze last time looked like this: After starting the engine build, at some point the terminal got really laggy and I saw the swap running full. After that, both terminals (one with engine build and one running tegrastats) froze completely. After 2 hours with no response, I reset the Orin.

Is it possible that last time there was just enough RAM to not go OOM, but not enough for the SSH service to be responsive?

Hi,

When we test this via ssh, the system is pretty slow due to the heavy workload.
However, it does respond but it might take minutes.

We have also confirmed that the model can work on TensorRT 8.5 without the “Unknown embedded device detected” error.
Thanks.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.