how to use DLA


I want to know how to process by using both DLA0 and DLA1 cores.

And are there any difference between DLA0 and DLA1 such as clock speed,
because DLA0 is always faster than DLA1 when I measured deep learning processing speed.



You will need to create two TensorRT application for each DLA core.
We don’t separate a networks into two DLA process due to the data transfer overhead.

We don’t see any obvious different between DLAs. Have you maximized the Xavier device first?

sudo jetson_clocks

If yes, could you share your model with us as well as the profiling difference?
We will try to reproduce this issue and report to our internal team.


Hi, Thank you for your reply.

yes. I set MAXN mode.
The condition is
-Clock mode: MAXN
-model: VGG16 with TensorRT
I used Jetson-inference’s imagenet-console.

here is the result(unit:ms)
#/ DLA0/ DLA1
1/ 21.40/ 24.83
2/ 20.94/ 26.99
3/ 21.80/ 26.77
4/ 21.16/ 27.06
5/ 21.64/ 25.73
6/ 21.85/ 25.30
7/ 21.01/ 26.14
8/ 21.82/ 26.23
9/ 21.87/ 26.28
10/ 21.84/ 26.51

Always DLA0 is faster than DLA1, therefore I ask you.
If you need more information, please let me know.



Would you mind to do this experiment with the trtexec directly rather than jetson_inference?
There are lots of components within the jetson_inference frameworks, ex. camera, coverter, inference, display,…

Since only the inference part uses DLA, it’s recommended to profiling it first.