Set up info:
System: Linux
Module: Jetson AGX Orin
CUDA: 11.4.315
cuDNN: 8.6.0.166
TensorRT: 8.5.2.2
Jetpack: 5.1
Issue: I tried to use cuDLA to infer the int8-quantified yolov8s model, I referred to the code in NVIDIA-AI-IOT:GitHub - NVIDIA-AI-IOT/cuDLA-samples: YOLOv5 on Orin DLA, when converting the trt engine, I used “–buildDLAStandalone” in trtexec, the entire model should be inferring in cuDLA, but the inference time is longer than the GPU model, increased from 3ms(GPU) to 12ms(cuDLA). Reports in the github code show that cuDLA inference model is faster, but I got the opposite result, I dont’t know what went wrong.
@AastaLLL I tried to use the github code and build_dla_standalone_loadable_v2.sh file in the github codebase to convert the engine, but the inference time for the int8 model was still 14ms, and I made sure to use it before inference.
Q: Why does my network run slower when using DLA than without DLA?
A: DLA was designed to maximize energy efficiency. Depending on the features supported by DLA and the features supported by the GPU, either implementation can be more performant. Your chosen implementation depends on your latency or throughput requirements and power budget. Since all DLA engines are independent of the GPU and each other, you could also use both implementations to increase the throughput of your network further.
The GitHub you used is tested on the DRIVE OS with YOLOv5.
You might get different results for YOLOv8 on Jetson.