The Throughput is too slow in Nvidia jetson AGX ORin DLA

detail log

dla.log (320.5 KB)
gpu.log (3.5 MB)

env

(base) orin@orin-root:~/workspace/DeepStream-Yolo$ jetson_release
Software part of jetson-stats 4.2.4 - (c) 2024, Raffaello Bonghi
Model: Jetson AGX Orin Developer Kit - Jetpack 5.1.2 [L4T 35.4.1]
NV Power Mode[0]: MAXN
Serial Number: [XXX Show with: jetson_release -s XXX]
Hardware:
 - P-Number: p3701-0005
 - Module: NVIDIA Jetson AGX Orin (64GB ram)
Platform:
 - Distribution: Ubuntu 20.04 focal
 - Release: 5.10.120-tegra
jtop:
 - Version: 4.2.4
 - Service: Active
Libraries:
 - CUDA: 11.4.315
 - cuDNN: 8.6.0.166
 - TensorRT: 5.1.2
 - VPI: 2.3.9
 - Vulkan: 1.3.204
 - OpenCV: 4.6.0 - with CUDA: YES

Hi,

Based on your log, the inference only runs on DLA0.

$ trtexec ... --useDLACore=0 ...

Please create another console and run the trtexec with --useDLACore=1 to deploy the inference on the other DLA core concurrently.

Thanks.

Only through simple calculations
two dla core - > 60 x2 → 120 qps
gpu → 210 qps
Is It reasonable?

Hi,

You can compare to our benchmark results below:

On Orin 64GB with MAXN mode:

GPU sparse INT8 peak DL performance: 171 TOPs
2x DLA sparse INT8 peak performance: 105 TOPs

Thanks.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.