DLA-v2 is slower than DLA-v1

Hi,

I’m testing AGX Orin’s NVDLA. The neural network inference results are significantly slower than Xavier-AGX and Xavier-NX. I’m following the instructions for neural networks on building the engines: GitHub - dusty-nv/jetson-inference: Hello AI World guide to deploying deep-learning inference networks and deep vision primitives with TensorRT and NVIDIA Jetson.. I’m ready to provide more details if necessary.

System details:
Xavier AGX ORIN Developer KIT
Power mode: MAXN
nv_tegra_release output: # R34 (release), REVISION: 1.0
Tensort 8.4.0

Results comparison(Orin vs Xavier AGX):
Alexnet: 78ms vs 30 ms
Googlenet: 8.02ms vs 5.5ms
Vgg-19: 49.8ms vs 22ms

Hi,

Confirm that we can reproduce the performance difference internally.

We are checking this issue with our internal team.
Will share more information with you later.

Thank you for confirming! Besides, I’m using TensorRT scripts instead of jetson-inference repo. I’m attaching here to give a reference.

building_engine_gpu_or_dla.py (1.7 KB)

The command that I use for tensorrt execution after building the engine: trtexec --iterations=100 --warmUp=2000 --batch=1 --useDLACore=0 --dumpProfile --loadEngine=alexnet_batch1_dla.engine

Another question: I guess we can’t install any other Jetpack version 5.0.1/5.0, so directly no different version other than TensorRT 8.4.0. May I also ask this issue is reproducible in older TensorRT versions? I’m asking this because newer JetPack version may take some time to be released, which is quite understandable.

Hi Splendor027,

If you haven’t already, I would recommend testing the models with INT8 precision enabled.

On a related note, we now have the following GitHub project to help getting started with the DLA. It covers defining and profiling a model, tweaking the model for better DLA compatibility, and performing INT8 calibration. It may help introduce you to some concepts related to working with the DLA.

Please let me know if you have any questions, or feel free to open an issue with feedback on the tutorial if you do take a look.

Best,
John

Hello @jaybdub,

Your work seems pretty awesome. I had been planning to create such a repo for a while since there is no detailed one. Thanks for sharing this publicly!

However, I could not observe any significant change in AGX Orin in your repo either. Could you please post any neural network results here on AGX Orin if you have any?

Hi,

We have checked this issue with our internal team and this is expected.
Due to the difference in hardware specification, a relative increase in latency is expected when running FP16 conv operations on Orin DLA as compared to running on Xavier DLA.

Thanks.

This is totally understandable. I appreciate your help on this @AastaLLL . One more note to my initial post: The power results on Orin AGX seem to be expected compared to Xavier AGX.

May I ask for further details on the Orin DLA? Being 5-10x slower than the GPU on Orin makes DLA unusable for any software purposes?

Additionally, we have been introduced as 9x speed-up for Orin DLA. May I ask if there is any type of application domain that we can use DLA with such performance? Sorry to push for frequent questions, but better performance on the DLA side motivated me a lot for my research!

Hi,

We got further details from our internal team.

Orin’s DLA has more int8 dense TOPs but fewer fp16 TOPs.
So if you run the model in int8 mode, it’s expected to get better performance compared to Xavier’s DLA.

Thanks.