Speed up yolov3 inference on nano (deepstream 4.0.1) using Coral USB accelerator?

Hey guys,

I’ve got full yolov3 running on the Jetson nano using deepstream 4.0.1, the inference fps is very slow ~2. I am trying to speed this up and I am wondering if we can do something using the Google Coral USB accelerator. (https://coral.withgoogle.com/docs/accelerator/get-started/)

It’s meant to work on Ubuntu 10.0+.

The coral usually takes tensorflow models and speeds them up, but how would this work in the case of the native yolov3 model on deepstream 4.0.1?


Deepstream uses TensorRT SDK for inference which is not supported on coral. The engine files from TensorRT are already optimized. The dafault configs for yolo samples in Deepstream SDK perform inference on every frame of the video. You can make use of the tracker and add intervals between successive inferences and then obtain a higher throughput. See related discussion here - https://devtalk.nvidia.com/default/topic/1058668

Since yolov3 is a compute heavy model you can also try the following -

  1. switch to yolov3-tiny
  2. Use fp16 mode for inference

You can also see this config file on how various plugin properties are set for jetson nano hardware specifically. - source8_1080p_dec_infer-resnet_tracker_tiled_display_fp16_nano.txt

@NvCJR thanks for your reply.
But I realised that on Nano the FPS reduces drastically when multiple models are loaded also even for a single model the load times are too long .
None the less the FPS offered by Coral USB accelerator can be leveraged with jetson Nano’s GPU specially when lower cost PCI-e accelerators can be added.

So wanted to know community’s opinion on the following method to pursue the integration of Coral USB accelerator with Deepstream:

Step 1. Use AppSrc and AppSink similar to https://github.com/google-coral/examples-camera/blob/master/gstreamer/gstreamer.py to do inferencing on an image pipeline via Coral USB accelerator --> Questions in this step are

  • What do you think about this approach ? Any pitfalls or this idea is not feasible at all?
  • How to handle situations when Batch>1

Step 2. After step 1 is complete and we have the bouning boxes. We feed the bounding boxes detected into nvTracker by injecting NvDsObjectMeta into NvDsFrameMeta. Questions in this step:

  • But not sure if this step is feasible-->Need feedback here and some pointers/examples if possible

Any feedback will really help.

Hi started a new thread :https://devtalk.nvidia.com/default/topic/1067239/deepstream-sdk/using-coral-usb-accelerator-with-jetson-nano/ for the above-mentioned query as I thought its better that way as the original poster had accepted the answer to the question