Getting less throughput while enabling DLAs on Jetson AGX Orin

shravanthi · January 19, 2023, 7:30am

Hi,

Currently we are downloading pretrained Resnet50 weights from keras and converted into onnx with multiple batches, after the conversion using below syntax we have converted into tensorRT

/usr/src/tensorrt/bin/trtexec --onnx=onnx_model.onnx --saveEngine=resnet50.trt --explicitBatch --inputIOFormats=int8:chw --outputIOFormats=int8:chw --int8 --useDLACore=0 --allowGPUFallback=True --sparsity=disable --verbose=True

We have prepared 2 models - one with GPU, and with DLA.
After inferencing we have collected the below results

From the above table, with GPU results are quite acceptable but with DLAs showing very low results. Moreover, we have seen this pattern with other tensorflow models (mobilenet, ssd-mobilenet, vgg etc) just wanted to know why it is giving very less throughput.

Can you please suggest why we are observing less throughput with DLA ?

spolisetty · January 19, 2023, 4:55pm

Hi,

We are moving this post to the Jetson AGX Orin forum to get better help.

Thank you.

jrb2 · January 19, 2023, 6:49pm

I have this same issue with a different model. Same exact model, only difference is whether or not I added --useDLACore=0 --allowGPUFallback=True.

In FP16, it is ~10X slower to use the DLA in my case.

AastaLLL · January 23, 2023, 2:02pm

Hi,

Do you want to compare the performance between GPU and DLA?

Please find information in our document below:
https://docs.nvidia.com/deeplearning/tensorrt/developer-guide/index.html#troubleshooting

Q: Why does my network run slower when using DLA compared to without DLA?

A: DLA was designed to maximize energy efficiency. Depending on the features supported by DLA and the features supported by the GPU, either implementation can be more performant. Which implementation to use depends on your latency or throughput requirements and your power budget. Since all DLA engines are independent of the GPU and each other, you could also use both implementations at the same time to further increase the throughput of your network.

Thanks.

ramc · February 9, 2023, 4:16pm

Also check out the DLA github page for samples and resources or to report issues: Recipes and tools for running deep learning workloads on NVIDIA DLA cores for inference applications.

We have a FAQ page that addresses some common questions that we see developers run into: Deep-Learning-Accelerator-SW/FAQ

system · February 23, 2023, 4:16pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Jetson Orin AGX DLA does't works normal, infer speed is lower than without DLA Jetson AGX Orin dla	4	15	April 14, 2025
Compute time in DLA slower than expected Jetson AGX Orin dla	5	935	July 28, 2023
Low performance while running model on DLA0, DLA1, and GPU at the same time on Jetson AGX Orin 64 GB Jetson Orin NX dla	7	975	February 14, 2023
Keys to optimization a network on AGX Orin DLA for latency Jetson AGX Orin tensorrt , dla	2	881	October 6, 2023
DLA-v2 is slower than DLA-v1 Jetson AGX Orin tensorrt , jetson-inference	8	2582	July 6, 2022
The Throughput is too slow in Nvidia jetson AGX ORin DLA Jetson AGX Orin cuda , cudnn , dla	4	496	January 31, 2024
GeMM performance on Orin DLA Jetson AGX Orin tensorrt , cuda , jetson-inference	10	907	February 21, 2024
Does DLA work faster than GPU in fp16 model? Jetson AGX Xavier dla	18	2704	June 8, 2022
Resnet50 with DLA takes 2x more latency than with just GPU DRIVE AGX Xavier General driveos-dl	13	1189	November 19, 2021
Run AI models completely on Jetson AGX Orin DLAs Jetson Nano dla	4	429	April 20, 2024

Getting less throughput while enabling DLAs on Jetson AGX Orin

Q: Why does my network run slower when using DLA compared to without DLA?

Related topics