Inference time of cuDLA on Jetson AGX Orin

Set up info:
System: Linux
Module: Jetson AGX Orin
CUDA: 11.4.315
cuDNN: 8.6.0.166
TensorRT: 8.5.2.2
Jetpack: 5.1

Issue: I tried to use cuDLA to infer the int8-quantified yolov8s model, I referred to the code in NVIDIA-AI-IOT:GitHub - NVIDIA-AI-IOT/cuDLA-samples: YOLOv5 on Orin DLA, when converting the trt engine, I used “–buildDLAStandalone” in trtexec, the entire model should be inferring in cuDLA, but the inference time is longer than the GPU model, increased from 3ms(GPU) to 12ms(cuDLA). Reports in the github code show that cuDLA inference model is faster, but I got the opposite result, I dont’t know what went wrong.

Thanks.

Hi,

Do you use exactly the same model as the GitHub?
And which precision do you use?
Thanks.

I didn’t use the same model,I use the model of yolov8s and I modified the code in github to be able to infer yolov8.

I used the precision of int8 and used the following command when converting onnx model to engine model:

–inputIOFormats=int8:dla_hwc4 --outputIOFormats=fp16:chw16 --int8 --fp16

@AastaLLL I tried to use the github code and build_dla_standalone_loadable_v2.sh file in the github codebase to convert the engine, but the inference time for the int8 model was still 14ms, and I made sure to use it before inference.

sudo jetson_clocks


Hi,

DLA is used for offloading the GPU resources but is not guaranteed a better performance.

https://docs.nvidia.com/deeplearning/tensorrt/developer-guide/index.html#faq

Q: Why does my network run slower when using DLA than without DLA?

A: DLA was designed to maximize energy efficiency. Depending on the features supported by DLA and the features supported by the GPU, either implementation can be more performant. Your chosen implementation depends on your latency or throughput requirements and power budget. Since all DLA engines are independent of the GPU and each other, you could also use both implementations to increase the throughput of your network further.

The GitHub you used is tested on the DRIVE OS with YOLOv5.
You might get different results for YOLOv8 on Jetson.

Thanks.