Yolov3 is very slow

Hi, both

Suppose you can get a better result by inference with TensorRT.
Would you mind to give it a try?


Again, problem is it is C++. To integrate it I need it to be python… It seems that all faster implementations use plugins that are specific to the model. Lets say I understand that yolo is a unique layer, fine. But it really makes sense to implement a plugin to replace all the common missing layers so we can get benchmark performance of, say, ssd_mobilenet benchmark (~30 fps) in python…
currently the uff ssd example is carefully tailored to ssd_inception and for the life of me I can’t figure out how to modify it to work with ssd_mobilenet.

After playing the Jetson nano for two weeks, like most others, I am a bit disappointed with the testing results. Most of us were attracted by the impressive benchmark figures. But finally we get only 10-15 fps. All the figures are just fine tuned with pure TensorRT, C++ etc. But most of us are working with opencv and Python, the results are quite disappointed. Tf-trt seems a middle strategies, but the loading is too slow. For most of us who are using Opencv and Python, Jetson is not very user-friendly. A lot of tuning is required to make our original programs to work (just in tf-trt, not to mention pure tensorRT). I think we were just having too much expectation on it. The truth is, it is faster than Rasperberry Pi :)

I agree that expectations were too high. I think tensorRT was not made ready yet for the flood of python developers that descended on the nano…
Having said that, I think that if NVIDIA will just release one or two good samples of using tensorRT in python (for example ssd_mobilenet and yolov3(-tiny)), the learning curve will be much less steep and the nano will get really cool apps. without these, we can’t provide real time inference…

Yes,tensorRT examples in Python really important. Nvidia released the sampleuffssd-rect.uff in C++ for our benchmarking, yes I could get that benchmark figures, but that is not a useful use case. I tried to use that uff model in the jetson-interference Python sample, it said the model is not supported. Indicating that these uff and programs are highly cohesive and really not generic. I stopped trying other model, wasted just lots of time, will wait until someone mades a break through.


Have you checked this one:

A SSD sample with python.

This is not python, as you admitted in another thread… You compile a specific (very specific!) cpp plugin. It can’t be reused, it can’t be generalized, it has lines in the code that say “only with this particular network”. It is a very sad example “in python”. Its almost like saying tensorflow is in python…

try https://devtalk.nvidia.com/default/topic/1052315/jetson-nano/python-wrapper-for-tensorrt-implementation-of-yolo-currently-v2-/

I have wrapped the deepstream library so it can be used in python. yolov2-tiny runs at about 20fps.

100% agree. I have a custom-trained object detection model (based on Keras-RetinaNet) that works quite well on a laptop but it brings my Jetson Nano to its knees due to out-of-memory errors. I can’t find a good path forward to convert the model into something that can be run on TensorRT. As a result I’ve been pretty much underwhelmed by the Nano. But what could I expect for $99? Another confirmation of the adage “don’t believe the hype”. (Sorry for my bad attitude, this has been very frustrating after such high expectations.)

Ahhh i am actually Familiar with keras-retinanet, as i used it in a kaggle competition. It is a large model, but you should be fine inferencing with it on 4gb card as long as your image is not insanely huge
The advice about the gpu fraction should work, although the fraction size should be probably larger for retinanet compared to the facial recognition. You might have to dig into the source to find where to put it though. Its not always possible to get session handle from outside. If I’ll have time I’ll have a look.

Thanks, Moshe.

I haven’t yet worked out how to insert the suggested configuration for GPU memory fraction into the TensorFlow session used by the Keras-RetinaNet implementation.

I am feeding image frames from a video stream (H.264/5) into the model, with a maximum width of 400 pixels.

BTW thanks for your project that provides a wrapper for the YOLO TRT app. How difficult would it be to train that YOLO detector model (or a YOLOv3 model) using a custom dataset, and then use your wrapper to run it the YOLO detection model on my Nano?

400 pixels should be fine.
I was going to say that retinanet might be a tad too slow for you on the nano. It uses a lot of CPU. Your laptop is probably i7 and it is much faster then nano.
training yolov3 is not hard but you have to transform your dataset to the coordinate system of yolo (if memory serves it is using centerpoint and (width,height) instead of the usual (x1, y1) (x2, y2))
not too difficult if you can script well.

follow the instructions at https://github.com/AlexeyAB/darknet carefully and you shouldn’t have a problem. Don’t try to train on the nano.


In general we have seen same behavior as you, Yolo requires too much memory and CPU for Jetson Nano and even for Tx1 and Tx2, in our case we have decided to use TinyYolo at our development, not sure if TinyYolo suits your needs but we have documented the process, code and benchmarks:



hello, im having a hard time trying to run Darknet tiny-yolo in the jetson nano, if i set the Flags to GPU=1 and OPENCV=1 and run the webcam detection sample the system just crash/freeze .
setting GPU=0. I can run the pictures detection without a problem and the detector demo webcam seems to open but takes like 10 min to show a single frame from the webcam.

my guess is that it must be something wrong related to CUDA settings in the makefile

ive set the following NVCC=/usr/local/cuda/bin/nvcc as posted in other thread.

I’ll appreciate any advice or Makefile example of how to make this run.

thank you


It’s recommended to use TensorRT rather than darknet on the Jetson Nano.
Here is a sample for your reference:


My first post, my first impression…

Just got the nano jetson, since long been using RPi’s, Odroids, and normal laptops etc etc like the rest of you I assume

I’m using Python for my video analyzing, I have no plans to change that. Managed to install OpenCV 4.1.1 with cuda support, took a long while but it works now

Running my script using yolov3, in an Odroid N2 takes around 10 sec to analyze a frame, in the nano jetson, I was hoping for a positive surprise, it takes approx 4 sec. In an old retired laptop running debian, the same frame takes just below a second (In RPi, just forget)

I’m also wondering if I’m doing something wrong. I’m not interested in running yolo tiny variants, the detection is much better with the heavy version

I’m listening, recommendation is TensorRT. But is the detection as good as in yolov3?

Otherwise, the nano jetson will be get a very early retirement

IMHO you need to renounce to use YOLOV3 on Jetson nano, is impossible to use.
You need to choose yolov3-tiny that with darknet could reach 17-18 fps at 416x416.

YoloV3 is wonderful but requires to many resources and in my opinion is required a good server with enough GPU (local or cloud).
Cosidering Jetson Nano consumption, it does a good job anyway.


Please check this comment:

We can reach 20fps for YOLOv3 on Jetson Nano.


Do you try to test by video file ? I find the usb webcam will very slowly by darknet.