DLA purpose

tavorbental · January 20, 2019, 3:51pm

Hi,

I am using the Jetson AGX Xavier with the latest JetPack 4.1.1 (TensorRT 5.0)

Why Nvidia added 2 DLA’s to the Xavier and not just increase the cuda-cores and tensor-cores?

When I used trtexec with ResNet50 on MAXN mode, I discovered the GPU is faster than the DLA.

The output of running on 1 DLA:

avgRuns: 1000
deploy: /home/nvidia/Networks/ResNet-50/deploy.prototxt
fp16
batch: 1
iterations: 5
output: prob
useSpinWait
useDLACore: 0
allowGPUFallback
Input “data”: 3x224x224
Output “prob”: 1000x1x1

Default DLA is enabled but layer prob is not running on DLA, falling back to GPU.
name=data, bindingIndex=0, buffers.size()=2
name=prob, bindingIndex=1, buffers.size()=2
Average over 1000 runs is 7.63907 ms (host walltime is 7.72017 ms, 99% percentile time is 7.86941).

The output of running on the GPU:

avgRuns: 1000
deploy: /home/nvidia/Networks/ResNet-50/deploy.prototxt
fp16
batch: 1
iterations: 5
output: prob
useSpinWait
Input “data”: 3x224x224
Output “prob”: 1000x1x1

name=data, bindingIndex=0, buffers.size()=2
name=prob, bindingIndex=1, buffers.size()=2
Average over 1000 runs is 3.49843 ms (host walltime is 3.54138 ms, 99% percentile time is 5.46234).

So I do not really understand what is the advantage of using the DLA over the GPU?

Thanks,
Bental

AastaLLL · January 21, 2019, 3:04am

Hi,

You can find some Xavier introduction here:
[url]https://devblogs.nvidia.com/nvidia-jetson-agx-xavier-32-teraops-ai-robotics/[/url]

Nvidia DLA is designed specifically for the deep learning use case and is used for offload the inference effort from GPU.
These engines improve energy efficiency and free up the GPU to run more complex networks and dynamic tasks implemented by the user.

Thanks.

Topic		Replies	Views
Deep Learning Accelerator problems DRIVE AGX Xavier General	2	1479	October 12, 2021
DLA and GPU cores at the same time Jetson AGX Xavier dla	20	10546	October 18, 2021
DLA / GPU question Jetson AGX Xavier dla	6	1020	October 18, 2021
how to use DLA Jetson AGX Xavier	4	1443	October 18, 2021
Big difference between using DLA core and not using DLA core Jetson Xavier NX tensorrt , dla	4	3121	October 18, 2021
Does DLA work faster than GPU in fp16 model? Jetson AGX Xavier dla	18	3065	June 8, 2022
Create engine and inference with DLAs on Xavier Jetson AGX Xavier tensorrt	5	585	October 18, 2021
using DLA but not accelerate Jetson AGX Xavier	2	1538	October 18, 2021
When DLA is enabled on NX, the speed is slower Jetson Xavier NX dla	3	914	October 18, 2021
I don't know the DLA. Jetson AGX Xavier	4	1314	October 18, 2021

DLA purpose

The output of running on 1 DLA:

Default DLA is enabled but layer prob is not running on DLA, falling back to GPU. name=data, bindingIndex=0, buffers.size()=2 name=prob, bindingIndex=1, buffers.size()=2 Average over 1000 runs is 7.63907 ms (host walltime is 7.72017 ms, 99% percentile time is 7.86941).

The output of running on the GPU:

name=data, bindingIndex=0, buffers.size()=2 name=prob, bindingIndex=1, buffers.size()=2 Average over 1000 runs is 3.49843 ms (host walltime is 3.54138 ms, 99% percentile time is 5.46234).

Related topics

Default DLA is enabled but layer prob is not running on DLA, falling back to GPU.
name=data, bindingIndex=0, buffers.size()=2
name=prob, bindingIndex=1, buffers.size()=2
Average over 1000 runs is 7.63907 ms (host walltime is 7.72017 ms, 99% percentile time is 7.86941).

name=data, bindingIndex=0, buffers.size()=2
name=prob, bindingIndex=1, buffers.size()=2
Average over 1000 runs is 3.49843 ms (host walltime is 3.54138 ms, 99% percentile time is 5.46234).