Keys to optimization a network on AGX Orin DLA for latency

gbart583 · October 5, 2023, 10:04am

Hi all,

I’m working on optimizing a Yolox_x model to run on the AGX Orin, particularly focused on reducing the latency with little cost to accuracy. Initially, I assumed that the DLA would help us “accelerate” our performance including latency. However, for all models I have tried, the latency has been increased by the DLA as compared to running purely on the GPU. Particularly for yolox_x , our latency with just GPUs is 4.8 ms and using the DLA gives us 23.9 ms (see the command below). While, I have read that the DLA is mostly focused on power efficiency, I wanted to make sure this is the case:

For the Orin, what is the TOPs metric on the DLA as compared to the GPU?
In what situations or ML networks does running with the DLA actually produce a lower latency over all?
The command I am running looks like:

trtexec --onnx=model.onnx --saveEngine=model.engine --exportProfile=model.json --int8 --fp16 --useDLACore=0 --allowGPUFallback --useSpinWait --separateProfileRun > model.log

Are there any other options or techniques I can use to lower the latency of this model when using the DLA?

Thank you

AastaLLL · October 6, 2023, 3:31am

Hi,

1.
You can check this technical report for the details:

Jetson AGX Orin 64GB has 2048 CUDA cores and 64 Tensor cores with up to 170 Sparse TOPs
of INT8 Tensor compute …

This enables up to 105 INT8 Sparse TOPs total on Jetson AGX Orin DLAs …

2.
DLA is used to leverage the GPU tasks.
In general, it increases throughput instead of latency.

3.
Please check if there are fallback layers in between.
The data transfer between GPU and DLA can lower the performance.

For reference, you can find more details and usage about DLA in the below GitHub:

Thanks.

system · November 6, 2023, 8:53am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Low performance while running model on DLA0, DLA1, and GPU at the same time on Jetson AGX Orin 64 GB Jetson Orin NX dla	7	975	February 14, 2023
Compute time in DLA slower than expected Jetson AGX Orin dla	5	935	July 28, 2023
The Throughput is too slow in Nvidia jetson AGX ORin DLA Jetson AGX Orin cuda , cudnn , dla	4	496	January 31, 2024
DLA-v2 is slower than DLA-v1 Jetson AGX Orin tensorrt , jetson-inference	8	2582	July 6, 2022
Getting less throughput while enabling DLAs on Jetson AGX Orin Jetson AGX Orin dla	5	764	February 23, 2023
The power consumption of DLA on orin is much higher than that of GPU？ Jetson AGX Orin tensorrt	5	440	October 24, 2023
GeMM performance on Orin DLA Jetson AGX Orin tensorrt , cuda , jetson-inference	10	907	February 21, 2024
TFLOPS(FP16) about DLA (Deep Learning Accelerator) on Jetson Orin NX Jetson AGX Orin dla , kb	4	1812	April 13, 2023
Why yolox inference time with DLA is longer than without DLA ，81 ms vs 8 ms? Jetson AGX Orin dla	5	526	June 9, 2023
DLA performance less (around half) than what's expected Jetson AGX Orin dla	6	124	December 9, 2024

Keys to optimization a network on AGX Orin DLA for latency

Related topics