The Throughput is too slow in Nvidia jetson AGX ORin DLA

FengtuWang · January 30, 2024, 2:37am

got the onnx model form here cuDLA-samples/data/model/yolov5_trimmed_qat.onnx at main · NVIDIA-AI-IOT/cuDLA-samples · GitHub
export engine using command, the result is that model ops is deployed in dla not gpu. Throughput: 60.1681 qps
deploy model in gpu, Throughput: 219.161 qps.
Is That the engine just run on only one dla core ?

detail log

dla.log (320.5 KB)
gpu.log (3.5 MB)

env

(base) orin@orin-root:~/workspace/DeepStream-Yolo$ jetson_release
Software part of jetson-stats 4.2.4 - (c) 2024, Raffaello Bonghi
Model: Jetson AGX Orin Developer Kit - Jetpack 5.1.2 [L4T 35.4.1]
NV Power Mode[0]: MAXN
Serial Number: [XXX Show with: jetson_release -s XXX]
Hardware:
 - P-Number: p3701-0005
 - Module: NVIDIA Jetson AGX Orin (64GB ram)
Platform:
 - Distribution: Ubuntu 20.04 focal
 - Release: 5.10.120-tegra
jtop:
 - Version: 4.2.4
 - Service: Active
Libraries:
 - CUDA: 11.4.315
 - cuDNN: 8.6.0.166
 - TensorRT: 5.1.2
 - VPI: 2.3.9
 - Vulkan: 1.3.204
 - OpenCV: 4.6.0 - with CUDA: YES

AastaLLL · January 30, 2024, 6:04am

Hi,

Based on your log, the inference only runs on DLA0.

$ trtexec ... --useDLACore=0 ...

Please create another console and run the trtexec with --useDLACore=1 to deploy the inference on the other DLA core concurrently.

Thanks.

FengtuWang · January 30, 2024, 7:06am

Only through simple calculations
two dla core - > 60 x2 → 120 qps
gpu → 210 qps
Is It reasonable?

AastaLLL · January 31, 2024, 5:03am

Hi,

You can compare to our benchmark results below:

On Orin 64GB with MAXN mode:

GPU sparse INT8 peak DL performance: 171 TOPs
2x DLA sparse INT8 peak performance: 105 TOPs

Thanks.

system · March 14, 2024, 4:22am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Low performance while running model on DLA0, DLA1, and GPU at the same time on Jetson AGX Orin 64 GB Jetson Orin NX dla	7	1099	February 14, 2023
Getting less throughput while enabling DLAs on Jetson AGX Orin Jetson AGX Orin dla	5	859	February 23, 2023
Jetson Orin AGX DLA does't works normal, infer speed is lower than without DLA Jetson AGX Orin dla	6	233	April 24, 2025
DLA performance less (around half) than what's expected Jetson AGX Orin dla	6	390	December 9, 2024
How to use both DLA and GPU cores concurrently? Jetson AGX Orin dla	8	285	April 25, 2025
Compute time in DLA slower than expected Jetson AGX Orin dla	5	1068	July 28, 2023
DLA and GPU cores at the same time Jetson AGX Xavier dla	20	10740	October 18, 2021
How to boost trtexec's gps for 1DLA only? Jetson AGX Orin jetson-inference , dla	16	1323	April 26, 2023
Why is the inference speed of DLA on agx orin much slower than that without DLA? TensorRT dla	1	91	March 28, 2025
DLA performance is not as expected Jetson AGX Orin dla	7	423	August 14, 2024

The Throughput is too slow in Nvidia jetson AGX ORin DLA

detail log

env

Related topics