DLA sensitivity to CPU/DLA frequency

lab2022 · July 6, 2023, 6:16am

Hello,

I am making some tests on the Jetson AGX Orin DLA based on the models provided at GitHub - NVIDIA/Deep-Learning-Accelerator-SW: NVIDIA DLA-SW, the recipes and tools for running deep learning workloads on NVIDIA DLA cores for inference applications..

I have made 2 observations when running ResNet50 on the DLA only with TensorRT:

The DLA latency seems to be very sensitive to the CPU frequency, while it is not the case for the GPU.
For some frequencies, increasing the DLA frequency (e.g. from its 30W frequency to its MAXN frequency) does not reduce the inference latency.

I could not clearly explain these behaviors. Does it mean :

For observation 1, that the CPU makes some pre/post-precessing operations before and after each DLA inference ? If so, are these operations only data transfers (DRAM) ?
For observation 2, does it means that the DLA is compute-bound ? Or that the NoC limits its accessible bandwidth?

Thank you in advance for your help.

AastaLLL · July 6, 2023, 7:18am

Hi,

1. Could you share the details of profiling data with us first?

$ /usr/src/tensorrt/bin/trtexec ... --profilingVerbosity=detailed --dumpLayerInfo --dumpProfile --separateProfileRun ...

2. We don’t provide an API to adjust DLA frequency.
DLA clock should stay the same across different power models.

But if DLA requires some data transfer or GPU fallback, it might be affected by the clock setting.

Thanks.

lab2022 · July 6, 2023, 8:47am

Thank you for your answer.

Please find attached here the log folder : frequency_tests.zip (47.8 KB) .
Here are the summed-up results :
- Changes in CPU frequency only :
  - GPU latency : 0.796ms (1113Hz) / 0.794ms (1728Hz) / 0.792ms (2200Hz)
  - DLA latency : 2.20 ms (1113Hz) / 2.05ms (1728Hz) / 1.99ms (2200Hz)
- Changes in DLA frequency only :
  - DLA latency : 2.6 ms (614Hz) / 2.02ms (1370Hz) / 1.98ms (1600Hz)
I was able to modify the DLA maximum frequency with the nvpmodel tool. I selected DLA frequencies among those used by the different power modes as suggested in tables from Jetson Orin NX Series and Jetson AGX Orin Series — Jetson Linux Developer Guide 34.1 documentation (nvidia.com) .

AastaLLL · July 10, 2023, 7:45am

Hi,

Sorry that my previous is not correct.

The DLA clock does look configurable.
We are going to reproduce this in our environment and check with the internal team first.
Will share more info with you later.

Thanks.

Topic		Replies	Views
Compute time in DLA slower than expected Jetson AGX Orin dla	5	935	July 28, 2023
Keys to optimization a network on AGX Orin DLA for latency Jetson AGX Orin tensorrt , dla	2	881	October 6, 2023
DLA performance less (around half) than what's expected Jetson AGX Orin dla	6	124	December 9, 2024
Getting less throughput while enabling DLAs on Jetson AGX Orin Jetson AGX Orin dla	5	764	February 23, 2023
Model inference Energy consumption of DLA on AGX Orin benchmark problem Jetson AGX Orin power , dla	15	722	February 13, 2024
Why yolox inference time with DLA is longer than without DLA ，81 ms vs 8 ms? Jetson AGX Orin dla	5	526	June 9, 2023
DLA-v2 is slower than DLA-v1 Jetson AGX Orin tensorrt , jetson-inference	8	2582	July 6, 2022
Run AI models completely on Jetson AGX Orin DLAs Jetson Nano dla	4	429	April 20, 2024
How does the TRT inference run on both DLA and GPUs? Jetson Orin NX tensorrt , dla	2	826	August 30, 2023
Low performance while running model on DLA0, DLA1, and GPU at the same time on Jetson AGX Orin 64 GB Jetson Orin NX dla	7	975	February 14, 2023

DLA sensitivity to CPU/DLA frequency

Related topics