How to accelerate yolov3 on NVDLA?


I am going to accelerate yolov3 on jetson NX. I have installed tensorRT and can accelerate the yolov3 using GPU, but the inference latency is too long for me. I realized that NVDLA could be called to further accelerate yolov3. I searched the web and just found some documents here:Developer Guide :: NVIDIA Deep Learning TensorRT Documentation

In this thread, no example code to call NVLDA in python is given. could someone kindly share some experience to call NVLDA in python?


You can serialize a TensorRT engine for NVDLA with trtexec.
And then read it from the python sample directly.

For example:

$ /usr/src/tensorrt/bin/trtexec --useDLACore=0 --allowGPUFallback --saveEngine=engine.dla ...

Please noted that DLA is designed for offloading the GPU workload.
It doesn’t guarantee the smaller latency but can free the GPU resources for other tasks.

Could you share the performance you observed for YOLOv3?
It’s expected that Xavier NX can reach ~608fps on v3 Tiny.


This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.