Inference time of cuDLA on Jetson AGX Orin

kc_chen · December 11, 2024, 2:32am

Set up info:
System: Linux
Module: Jetson AGX Orin
CUDA: 11.4.315
cuDNN: 8.6.0.166
TensorRT: 8.5.2.2
Jetpack: 5.1

Issue: I tried to use cuDLA to infer the int8-quantified yolov8s model, I referred to the code in NVIDIA-AI-IOT:GitHub - NVIDIA-AI-IOT/cuDLA-samples: YOLOv5 on Orin DLA, when converting the trt engine, I used “–buildDLAStandalone” in trtexec, the entire model should be inferring in cuDLA, but the inference time is longer than the GPU model, increased from 3ms(GPU) to 12ms(cuDLA). Reports in the github code show that cuDLA inference model is faster, but I got the opposite result, I dont’t know what went wrong.

Thanks.

AastaLLL · December 11, 2024, 7:02am

Hi,

Do you use exactly the same model as the GitHub?
And which precision do you use?
Thanks.

kc_chen · December 13, 2024, 1:19am

I didn’t use the same model,I use the model of yolov8s and I modified the code in github to be able to infer yolov8.

I used the precision of int8 and used the following command when converting onnx model to engine model:

–inputIOFormats=int8:dla_hwc4 --outputIOFormats=fp16:chw16 --int8 --fp16

kc_chen · December 17, 2024, 8:32am

@AastaLLL I tried to use the github code and build_dla_standalone_loadable_v2.sh file in the github codebase to convert the engine, but the inference time for the int8 model was still 14ms, and I made sure to use it before inference.

sudo jetson_clocks

Snipaste_2024-12-13_09-54-48582×483 7.75 KB

Snipaste_2024-12-13_09-55-48932×128 9.22 KB

AastaLLL · December 19, 2024, 3:40am

Hi,

DLA is used for offloading the GPU resources but is not guaranteed a better performance.

https://docs.nvidia.com/deeplearning/tensorrt/developer-guide/index.html#faq

Q: Why does my network run slower when using DLA than without DLA?

A: DLA was designed to maximize energy efficiency. Depending on the features supported by DLA and the features supported by the GPU, either implementation can be more performant. Your chosen implementation depends on your latency or throughput requirements and power budget. Since all DLA engines are independent of the GPU and each other, you could also use both implementations to increase the throughput of your network further.

The GitHub you used is tested on the DRIVE OS with YOLOv5.
You might get different results for YOLOv8 on Jetson.

Thanks.

Topic		Replies	Views
Why yolox inference time with DLA is longer than without DLA ，81 ms vs 8 ms? Jetson AGX Orin dla	5	509	June 9, 2023
The Throughput is too slow in Nvidia jetson AGX ORin DLA Jetson AGX Orin cuda , cudnn , dla	4	479	January 31, 2024
The power consumption of DLA on orin is much higher than that of GPU？ Jetson AGX Orin tensorrt	5	424	October 24, 2023
Compute time in DLA slower than expected Jetson AGX Orin dla	5	906	July 28, 2023
Deploying YOLOv5 on NVIDIA Jetson Orin with cuDLA: Quantization-Aware Training to Inference Technical Blog	0	448	August 31, 2023
Any performance benefits in using directly cuDLA instead of TensorRT? Jetson AGX Orin tensorrt , dla	3	591	February 9, 2023
DLA-v2 is slower than DLA-v1 Jetson AGX Orin tensorrt , jetson-inference	8	2478	July 6, 2022
GeMM performance on Orin DLA Jetson AGX Orin tensorrt , cuda , jetson-inference	10	880	February 21, 2024
Keys to optimization a network on AGX Orin DLA for latency Jetson AGX Orin tensorrt , dla	2	823	October 6, 2023
Is available NVIDIA custom inference library for Yolov8 yet? Jetson AGX Orin yolo	8	330	April 30, 2024

Inference time of cuDLA on Jetson AGX Orin

Related topics