Hello,
I am making some tests on the Jetson AGX Orin DLA based on the models provided at GitHub - NVIDIA/Deep-Learning-Accelerator-SW: NVIDIA DLA-SW, the recipes and tools for running deep learning workloads on NVIDIA DLA cores for inference applications..
I have made 2 observations when running ResNet50 on the DLA only with TensorRT:
- The DLA latency seems to be very sensitive to the CPU frequency, while it is not the case for the GPU.
- For some frequencies, increasing the DLA frequency (e.g. from its 30W frequency to its MAXN frequency) does not reduce the inference latency.
I could not clearly explain these behaviors. Does it mean :
- For observation 1, that the CPU makes some pre/post-precessing operations before and after each DLA inference ? If so, are these operations only data transfers (DRAM) ?
- For observation 2, does it means that the DLA is compute-bound ? Or that the NoC limits its accessible bandwidth?
Thank you in advance for your help.
Hi,
1. Could you share the details of profiling data with us first?
$ /usr/src/tensorrt/bin/trtexec ... --profilingVerbosity=detailed --dumpLayerInfo --dumpProfile --separateProfileRun ...
2. We don’t provide an API to adjust DLA frequency.
DLA clock should stay the same across different power models.
But if DLA requires some data transfer or GPU fallback, it might be affected by the clock setting.
Thanks.
Thank you for your answer.
-
Please find attached here the log folder : frequency_tests.zip (47.8 KB) .
Here are the summed-up results :
- Changes in CPU frequency only :
- GPU latency : 0.796ms (1113Hz) / 0.794ms (1728Hz) / 0.792ms (2200Hz)
- DLA latency : 2.20 ms (1113Hz) / 2.05ms (1728Hz) / 1.99ms (2200Hz)
- Changes in DLA frequency only :
- DLA latency : 2.6 ms (614Hz) / 2.02ms (1370Hz) / 1.98ms (1600Hz)
-
I was able to modify the DLA maximum frequency with the nvpmodel tool. I selected DLA frequencies among those used by the different power modes as suggested in tables from Jetson Orin NX Series and Jetson AGX Orin Series — Jetson Linux Developer Guide 34.1 documentation (nvidia.com) .
Hi,
Sorry that my previous is not correct.
The DLA clock does look configurable.
We are going to reproduce this in our environment and check with the internal team first.
Will share more info with you later.
Thanks.