Full Yolov3 on the nano using TensorRT or Deepstream 4.0.1

Hey everyone,

Has anyone tried running full yolov3 on the nano using either TensorRT or Deepstream 4.0.1? If so, what FPS did you get?

More keen to know about Deepstream given it’s meant to be capable for multi-stream analysis.

I tested it few days ago using deepstream.

Yolov3: 2 FPS
Yolov3-tiny: 28-30 FPS

Anyway there is an alternative version of YoloV3-Tiny, not compatible with deepstream (I tested it), with a better inference.
Try it with darknet: GitHub - WongKinYiu/PartialResidualNetworks: partial residual networks

Using it in python I’m able to reach 17-18 fps analizyng an RTSP stream

Thanks Simone.

It would have been great to have usable FPS with full Yolov3 on the nano, really making it the smartest IoT AI device out there. I do have some follow up questions:

  1. Did you test it on live video or a mp4 file?
  2. Have you tested it using a CSI camera and gstreamer? We tested just reading frames with this method and saw a massive jump in FPS (not yolo, just reading frames)
  3. Did you test in Python or C++?
  4. And lastly, was this test powered with the USB input or DC barrel jack?

Sorry about the number of questions :)

There is no way to get Yolov3 full working at more than 5-6 fps… So don’t spend time on it.
You could reach this “goal” using Darknet executable: GitHub - AlexeyAB/darknet: YOLOv4 / Scaled-YOLOv4 / YOLO - Neural Networks for Object Detection (Windows and Linux version of Darknet )

  1. Did you test it on live video or a mp4 file?
    RTSP H264 live stream, static video doesn’t change the result
  2. Have you tested it using a CSI camera and gstreamer? We tested just reading frames with this method and saw a massive jump in FPS (not yolo, just reading frames)
    You need to use gstreamer pipeline (absolutely incomprensible) to access to HW decoding and increase performance
  3. Did you test in Python or C++?
    I used Python, but seems that c++ has better performance.
  4. And lastly, was this test powered with the USB input or DC barrel jack?
    I use micro-usb input but with a 15W power supplier

Thanks Simone, there are some tricks to speed it up even with the full yolov3.

If you’re interested, here’s how you can run the CSI camera, this includes the gstreamer pipelines required. GitHub - JetsonHacksNano/CSI-Camera: Simple example of using a CSI-Camera (like the Raspberry Pi Version 2 camera) with the NVIDIA Jetson Developer Kit

I am wondering if we can use the Coral USB accelerator with nano whilst using deepstream’s yolov3 implementation. If we can this might help speed up inference, unless the comms over the USB interface are too slow…Might have to play around a bit.

I don’t know which is your target but you cannot run Yolov3 Full on Jetson nano reaching atleast 10fps, not without compromise object recognition.
So if you would like to run full Yolo consider to swap to a PC with a GTX1080

I shared my python implementation of yolov3 and yolov3-tiny, which was modified from the original TensorRT ‘yolov3_onnx’ sample. (This FPS numbers were including image acquisition and preprocessing/postprocessing.)

https://jkjung-avt.github.io/tensorrt-yolov3/
https://github.com/jkjung-avt/tensorrt_demos#yolov3

They don’t seem as good as the implementations of DeepStream. I think the difference could come from: (1) my python preprocessing/postprocessing code is slower than C++ implementation in DeepStream; (2) opencv imread.