Jetson Orin AGX DLA does't works normal, infer speed is lower than without DLA

Hi there,

I try to use DLA to accelerate my inference speed, I follow the guide use trtexec to convert onnx model to TensorRT, I tried to convert it with option useDLACore=0 and without it, I found that without DLA seems much faster than with DLA. Could you help to explain the reason? The onnx model as attachment.
example.zip (23.4 MB)




Hi,

DLA targets for offload GPU workload instead of performance.
Please find below for more info:

Q: Why does my network run slower when using DLA than without DLA?

A: DLA was designed to maximize energy efficiency. Depending on the features supported by DLA and the features supported by the GPU, either implementation can be more performant. Your chosen implementation depends on your latency or throughput requirements and power budget. Since all DLA engines are independent of the GPU and each other, you could also use both implementations to increase the throughput of your network further.

Thanks.

Hi,
I am concerned about the GPU and DLA, What I want is to get low lantency when I infer with my model,
I tried with Orin NX and Orin Nano, I found that the Orin Nano even faster than Orin NX(it sounds amazing), So I compare the table sheet of two device, I found that the GPU frequency is 918MHz of Orin Nano(25w), but the 408MHz of OrinNX, so it is much slower than the Orin Nano. I found that it has DLA, so I try to utilize the DLA to speedup the inference, that’s why I ask this question, it seems Orin NX with DLA even lower than the Orin NX without DLA.
So my clear question is how can I get the better throughout(low lantency) with Orin NX, It at least better than Orin Nano.



Hi,

Please note that Orin Nano doesn’t have DLA.
So all the inference will be run on GPU.

Have you tried the MaxN mode for the Orin NX, the clock rate is 918 and much higher than the 25W mode.

Thanks.

Thanks for your reply. @AastaLLL
Yes, I know that Orin Nano has no DLA, so it runs fully on GPU.
I tried the MaxN mode of Orin NX, it’s speed is similar to the Orin Nano.
So that’s why I feel confused, I want to get more computer power so I use Orin NX insted of Orin Nano, but I found that it is amost the same with the MaxN mode. So I doubt my choice whether I need to use Orin NX, because it doesn’t provide more throughput to run my model.
Any advice from your side about how to select the Orin NX and Orin Nano? And any suggestion about which senario shoud I use Orin NX and which senario I can use Orin Nano is enough.

Thanks so much

Hi,

Sorry for the late update.

Based on our document for r36.4.3:

Orin NX super:
8x CPU @1984Hz, GPU @1173Hz, EMC @3200Hz

Orin Nano super:
6x CPU @1728Hz, GPU @1020Hz, EMC @2133Hz

The clocks in Orin NX are higher than the Orin Nano.
So it’s expected to have better performance/throughput on the Orin NX.

Thanks.