I am going to accelerate yolov3 on jetson NX. I have installed tensorRT and can accelerate the yolov3 using GPU, but the inference latency is too long for me. I realized that NVDLA could be called to further accelerate yolov3. I searched the web and just found some documents here:Developer Guide :: NVIDIA Deep Learning TensorRT Documentation
In this thread, no example code to call NVLDA in python is given. could someone kindly share some experience to call NVLDA in python?
Please noted that DLA is designed for offloading the GPU workload.
It doesn’t guarantee the smaller latency but can free the GPU resources for other tasks.
Could you share the performance you observed for YOLOv3?
It’s expected that Xavier NX can reach ~608fps on v3 Tiny.