How does the TRT inference run on both DLA and GPUs?

vinayakah · July 26, 2023, 3:10pm

Hi, please refer to the below MLCommons benchmark link. For the AGX Orin row, it is mentioned that both GPUs and the two DLAs are used for TRT inference.

So, I wanted to know how is the inference run parallely on both the GPU and the DLA.
Also, how can both the DLAs be utilized at the same time while doing the inference.

MLCommons link : v3.0 Results | MLCommons

And, in the corresponding github repo also, I couldn’t find any lines of code which indicate the above behaviour of using both the DLAs concurrently or running inference across multiple devices parallely.

Resnet50 TRT Inference code for reference : https://github.com/mlcommons/inference_results_v3.0/blob/main/closed/NVIDIA/code/resnet50/tensorrt/ResNet50.py

AastaLLL · August 2, 2023, 8:27am

Hi,

DLAs and GPU can run concurrently.

But since DLA is a hardware accelerator with limited functionality, some models cannot fully run on the DLA.
If a model requires GPU fallback frequently, the data transfer between GPU and DLA can decrease the performance.
In such a case, the throughput of DLAs+GPU might not be higher than the GPU-only mode.

We have another benchmark repo that can give you some idea about this:
Devices=3 indicates DLAs+GPU, and devices=1 means GPU-only.

github.com

NVIDIA-AI-IOT/jetson_benchmarks/blob/master/benchmark_csv/orin-benchmarks.csv

ModelName,FrameWork,Devices,BatchSizeGPU,BatchSizeDLA,WS_GPU,WS_DLA,input,output,URL
inception_v4,caffe,3,32,32,2048,1024,NA,prob,https://www.dropbox.com/s/nsz5agnidxgvrpe/inception_v4.prototxt
vgg19_N2,caffe,1,64,0,2048,0,NA,prob,https://www.dropbox.com/s/7xmv5ez72dgw8we/vgg19_N2.prototxt
super_resolution_bsd500,onnx,3,8,8,2048,0,NA,NA,https://www.dropbox.com/s/hdhxndo23cm9i5y/super_resolution_bsd500.zip
unet-segmentation,tensorrt,1,32,0,2048,None,"input_1,1,512,512",conv2d_19/Sigmoid,https://www.dropbox.com/s/zvmcxed1f18g7qd/unet-segmentation.uff
pose_estimation,caffe,1,64,0,2048,None,NA,Mconv7_stage2_L2,https://www.dropbox.com/s/fs4sb9vo5ccws9c/pose_estimation.prototxt
yolov3-tiny-416,onnx,3,32,32,2048,0,NA,NA,https://www.dropbox.com/s/bb9k1vh4fumv5i2/yolov3-tiny-416.zip?dl=0
ResNet50_224x224,caffe,3,32,32,2048,1024,NA,prob,https://www.dropbox.com/s/xut47um2g509m5o/ResNet50_224x224.prototxt
mobilenet_v1_ssd,caffe,3,32,32,2048,1024,NA,conv17_2_mbox_loc:conv17_2_mbox_conf:conv16_2_mbox_loc:conv16_2_mbox_conf:conv15_2_mbox_loc:conv15_2_mbox_conf:conv14_2_mbox_loc:conv14_2_mbox_conf:conv13_mbox_loc:conv13_mbox_conf:conv11_mbox_loc:conv11_mbox_conf,https://www.dropbox.com/s/qn4mxjillbjimk1/mobilenet_v1_ssd.prototxt
ssd_resnet34_1200x1200,caffe,3,1,1,2048,2048,NA,conf_out,https://www.dropbox.com/s/2ei1gpalt0aer3i/ssd_resnet34_1200x1200.prototxt

Thanks.

system · August 30, 2023, 12:54am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
How to boost trtexec's gps for 1DLA only? Jetson AGX Orin jetson-inference , dla	16	1322	April 26, 2023
How to use both DLA and GPU cores concurrently? Jetson AGX Orin dla	8	284	April 25, 2025
Run GPU and DLAs concurrently Jetson AGX Xavier dla	4	729	October 18, 2021
How to use GPU+2 * DLA inference model TensorRT	0	248	February 4, 2024
Deploy three AI model engines on both DLAs and GPU Jetson AGX Xavier tensorrt , jetson-inference , dla , gpu	4	734	September 26, 2023
Getting less throughput while enabling DLAs on Jetson AGX Orin Jetson AGX Orin dla	5	859	February 23, 2023
DLA and GPU cores at the same time Jetson AGX Xavier dla	20	10740	October 18, 2021
DLA Inference Latency Issue on Orin Platform Jetson AGX Orin dla	3	123	August 28, 2025
Jetson Orin: Running DLA and GPU cores at the same time Jetson AGX Orin dla	4	1037	October 19, 2022
Run a part of DNN on DLA and part of DNN on GPU Jetson AGX Xavier dla	7	1357	February 14, 2023

How does the TRT inference run on both DLA and GPUs?

Related topics