Keys to optimization a network on AGX Orin DLA for latency

Hi all,

I’m working on optimizing a Yolox_x model to run on the AGX Orin, particularly focused on reducing the latency with little cost to accuracy. Initially, I assumed that the DLA would help us “accelerate” our performance including latency. However, for all models I have tried, the latency has been increased by the DLA as compared to running purely on the GPU. Particularly for yolox_x , our latency with just GPUs is 4.8 ms and using the DLA gives us 23.9 ms (see the command below). While, I have read that the DLA is mostly focused on power efficiency, I wanted to make sure this is the case:

  • For the Orin, what is the TOPs metric on the DLA as compared to the GPU?
  • In what situations or ML networks does running with the DLA actually produce a lower latency over all?
  • The command I am running looks like:
trtexec --onnx=model.onnx --saveEngine=model.engine --exportProfile=model.json --int8 --fp16 --useDLACore=0 --allowGPUFallback --useSpinWait --separateProfileRun > model.log
  • Are there any other options or techniques I can use to lower the latency of this model when using the DLA?

Thank you

Hi,

1.
You can check this technical report for the details:

Jetson AGX Orin 64GB has 2048 CUDA cores and 64 Tensor cores with up to 170 Sparse TOPs
of INT8 Tensor compute …

This enables up to 105 INT8 Sparse TOPs total on Jetson AGX Orin DLAs …

2.
DLA is used to leverage the GPU tasks.
In general, it increases throughput instead of latency.

3.
Please check if there are fallback layers in between.
The data transfer between GPU and DLA can lower the performance.

For reference, you can find more details and usage about DLA in the below GitHub:

Thanks.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.