Why yolox inference time with DLA is longer than without DLA ,81 ms vs 8 ms?

Jetpack version is 5.0.2.

DLA config is blow:

config.set_flag(trt.BuilderFlag.GPU_FALLBACK)
config.set_flag(trt.BuilderFlag.FP16)
config.default_device_type = trt.DeviceType.DLA
config.DLA_core = 0

Tensorrt config is blow:

config.set_flag(trt.BuilderFlag.FP16)

Hi,

Have you maximized the device performance first?

$ sudo nvpmodel -m 0
$ sudo jetson_clocks

Please check the output log of TensorRT.
If the model cannot run on DLA solely, the data transfer between GPU and DLA will cause an extra overhead and decrease the performance.

Thanks.

Thank you very much. Is there any other method to improve the performance of DLA? Or how to use DLA correctly?

Hi,

When you convert the model into TensorRT, it will show the layer placement of DLA and GPU.
The best case is to deploy a network that can run solely on the DLA so no extra data transfer overhead is required.

Thanks.

Check out the DLA github page for samples and resources: Recipes and tools for running deep learning workloads on NVIDIA DLA cores for inference applications.

We have a FAQ page that addresses some common questions that we see developers run into: Deep-Learning-Accelerator-SW/FAQ

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.