Yolov3 is very slow

sariag · May 2, 2019, 7:38pm

Hello,
I have run Yolov3 on Jetson Nano but it is way tooo slow, fps is 0.8,
Tiny yolo is about 10 fps.

Any idea how to improve the performance in yolov3?

Your help is appreciated!

moshe.livne · May 2, 2019, 11:10pm

how did you run it? using the darknet repository GitHub - AlexeyAB/darknet: YOLOv4 / Scaled-YOLOv4 / YOLO - Neural Networks for Object Detection (Windows and Linux version of Darknet ) i get 2 fps with yolov3.
use the following flags in the makefile:
GPU=1
CUDNN=1
CUDNN_HALF=1
OPENCV=1
AVX=0
OPENMP=1
LIBSO=1
ZED_CAMERA=0

tiny is still 10fps. not sure if my nano is in “turbo mode” as I am feeding it through usb

AastaLLL · May 3, 2019, 6:02am

Hi, both

Suppose you can get a better result by inference with TensorRT.
Would you mind to give it a try?
[url]https://github.com/NVIDIA-AI-IOT/deepstream_reference_apps/tree/master/yolo#trt-yolo-app[/url]

Thanks.

moshe.livne · May 3, 2019, 7:24am

Again, problem is it is C++. To integrate it I need it to be python… It seems that all faster implementations use plugins that are specific to the model. Lets say I understand that yolo is a unique layer, fine. But it really makes sense to implement a plugin to replace all the common missing layers so we can get benchmark performance of, say, ssd_mobilenet benchmark (~30 fps) in python…
currently the uff ssd example is carefully tailored to ssd_inception and for the life of me I can’t figure out how to modify it to work with ssd_mobilenet.

ktktkt · May 3, 2019, 8:13am

After playing the Jetson nano for two weeks, like most others, I am a bit disappointed with the testing results. Most of us were attracted by the impressive benchmark figures. But finally we get only 10-15 fps. All the figures are just fine tuned with pure TensorRT, C++ etc. But most of us are working with opencv and Python, the results are quite disappointed. Tf-trt seems a middle strategies, but the loading is too slow. For most of us who are using Opencv and Python, Jetson is not very user-friendly. A lot of tuning is required to make our original programs to work (just in tf-trt, not to mention pure tensorRT). I think we were just having too much expectation on it. The truth is, it is faster than Rasperberry Pi :)

moshe.livne · May 3, 2019, 8:23am

I agree that expectations were too high. I think tensorRT was not made ready yet for the flood of python developers that descended on the nano…
Having said that, I think that if NVIDIA will just release one or two good samples of using tensorRT in python (for example ssd_mobilenet and yolov3(-tiny)), the learning curve will be much less steep and the nano will get really cool apps. without these, we can’t provide real time inference…

ktktkt · May 3, 2019, 9:27am

Yes，tensorRT examples in Python really important. Nvidia released the sampleuffssd-rect.uff in C++ for our benchmarking, yes I could get that benchmark figures, but that is not a useful use case. I tried to use that uff model in the jetson-interference Python sample, it said the model is not supported. Indicating that these uff and programs are highly cohesive and really not generic. I stopped trying other model, wasted just lots of time, will wait until someone mades a break through.

AastaLLL · May 10, 2019, 7:53am

Hi,

Have you checked this one:
[url]https://docs.nvidia.com/deeplearning/sdk/tensorrt-sample-support-guide/index.html#uff_ssd[/url]

A SSD sample with python.
Thanks.

moshe.livne · May 10, 2019, 8:00am

This is not python, as you admitted in another thread… You compile a specific (very specific!) cpp plugin. It can’t be reused, it can’t be generalized, it has lines in the code that say “only with this particular network”. It is a very sad example “in python”. Its almost like saying tensorflow is in python…

moshe.livne · May 23, 2019, 1:10am

try https://devtalk.nvidia.com/default/topic/1052315/jetson-nano/python-wrapper-for-tensorrt-implementation-of-yolo-currently-v2-/

I have wrapped the deepstream library so it can be used in python. yolov2-tiny runs at about 20fps.

monocongo · May 23, 2019, 5:05pm

100% agree. I have a custom-trained object detection model (based on Keras-RetinaNet) that works quite well on a laptop but it brings my Jetson Nano to its knees due to out-of-memory errors. I can’t find a good path forward to convert the model into something that can be run on TensorRT. As a result I’ve been pretty much underwhelmed by the Nano. But what could I expect for $99? Another confirmation of the adage “don’t believe the hype”. (Sorry for my bad attitude, this has been very frustrating after such high expectations.)

moshe.livne · May 23, 2019, 9:04pm

Ahhh i am actually Familiar with keras-retinanet, as i used it in a kaggle competition. It is a large model, but you should be fine inferencing with it on 4gb card as long as your image is not insanely huge
The advice about the gpu fraction should work, although the fraction size should be probably larger for retinanet compared to the facial recognition. You might have to dig into the source to find where to put it though. Its not always possible to get session handle from outside. If I’ll have time I’ll have a look.

monocongo · May 23, 2019, 10:17pm

Thanks, Moshe.

I haven’t yet worked out how to insert the suggested configuration for GPU memory fraction into the TensorFlow session used by the Keras-RetinaNet implementation.

I am feeding image frames from a video stream (H.264/5) into the model, with a maximum width of 400 pixels.

BTW thanks for your project that provides a wrapper for the YOLO TRT app. How difficult would it be to train that YOLO detector model (or a YOLOv3 model) using a custom dataset, and then use your wrapper to run it the YOLO detection model on my Nano?

moshe.livne · May 23, 2019, 10:29pm

400 pixels should be fine.
I was going to say that retinanet might be a tad too slow for you on the nano. It uses a lot of CPU. Your laptop is probably i7 and it is much faster then nano.
training yolov3 is not hard but you have to transform your dataset to the coordinate system of yolo (if memory serves it is using centerpoint and (width,height) instead of the usual (x1, y1) (x2, y2))
not too difficult if you can script well.

follow the instructions at [url]https://github.com/AlexeyAB/darknet[/url] carefully and you shouldn’t have a problem. Don’t try to train on the nano.

Carlos001 · May 30, 2019, 4:51pm

Hi,

In general we have seen same behavior as you, Yolo requires too much memory and CPU for Jetson Nano and even for Tx1 and Tx2, in our case we have decided to use TinyYolo at our development, not sure if TinyYolo suits your needs but we have documented the process, code and benchmarks:

[url]https://developer.ridgerun.com/wiki/index.php?title=GstInference/Supported_architectures/TinyYoloV2[/url]

Regards.

haroldpantaleon · May 30, 2019, 5:35pm

hello, im having a hard time trying to run Darknet tiny-yolo in the jetson nano, if i set the Flags to GPU=1 and OPENCV=1 and run the webcam detection sample the system just crash/freeze .
setting GPU=0. I can run the pictures detection without a problem and the detector demo webcam seems to open but takes like 10 min to show a single frame from the webcam.

my guess is that it must be something wrong related to CUDA settings in the makefile

ive set the following NVCC=/usr/local/cuda/bin/nvcc as posted in other thread.

I’ll appreciate any advice or Makefile example of how to make this run.

thank you

AastaLLL · May 31, 2019, 6:23am

Hi,

It’s recommended to use TensorRT rather than darknet on the Jetson Nano.
Here is a sample for your reference:
https://devtalk.nvidia.com/default/topic/1052315/jetson-nano/python-wrapper-for-tensorrt-implementation-of-yolo-currently-v2-/

Thanks.

walter.krambring · November 11, 2019, 5:25pm

My first post, my first impression…

Just got the nano jetson, since long been using RPi’s, Odroids, and normal laptops etc etc like the rest of you I assume

I’m using Python for my video analyzing, I have no plans to change that. Managed to install OpenCV 4.1.1 with cuda support, took a long while but it works now

Running my script using yolov3, in an Odroid N2 takes around 10 sec to analyze a frame, in the nano jetson, I was hoping for a positive surprise, it takes approx 4 sec. In an old retired laptop running debian, the same frame takes just below a second (In RPi, just forget)

I’m also wondering if I’m doing something wrong. I’m not interested in running yolo tiny variants, the detection is much better with the heavy version

I’m listening, recommendation is TensorRT. But is the detection as good as in yolov3?

Otherwise, the nano jetson will be get a very early retirement

simone.rinaldi · November 12, 2019, 8:45am

My first post, my first impression…

Just got the nano jetson, since long been using RPi’s, Odroids, and normal laptops etc etc like the rest of you I assume

I’m using Python for my video analyzing, I have no plans to change that. Managed to install OpenCV 4.1.1 with cuda support, took a long while but it works now

Running my script using yolov3, in an Odroid N2 takes around 10 sec to analyze a frame, in the nano jetson, I was hoping for a positive surprise, it takes approx 4 sec. In an old retired laptop running debian, the same frame takes just below a second (In RPi, just forget)

I’m also wondering if I’m doing something wrong. I’m not interested in running yolo tiny variants, the detection is much better with the heavy version

I’m listening, recommendation is TensorRT. But is the detection as good as in yolov3?

Otherwise, the nano jetson will be get a very early retirement

IMHO you need to renounce to use YOLOV3 on Jetson nano, is impossible to use.
You need to choose yolov3-tiny that with darknet could reach 17-18 fps at 416x416.

YoloV3 is wonderful but requires to many resources and in my opinion is required a good server with enough GPU (local or cloud).
Cosidering Jetson Nano consumption, it does a good job anyway.

AastaLLL · November 13, 2019, 2:17am

Hi,

Please check this comment:
https://devtalk.nvidia.com/default/topic/1064871/deepstream-sdk/deepstream-gst-nvstreammux-change-width-and-height-doesn-t-affect-fps/post/5392823/#5392823

We can reach 20fps for YOLOv3 on Jetson Nano.

Thanks.

Topic		Replies	Views
run yolov3-tiny with tensorRT model Jetson Nano	7	3433	January 4, 2020
Jetson nano crashed when using tiny yolo v3 model Jetson Nano	24	12656	October 18, 2021
Python wrapper for tensorrt implementation of Yolo (currently v2) Jetson Nano	32	8038	July 2, 2020
Yolov3 on Jetson Nano GPU = 1 really slow and freezes Jetson Nano	5	1907	October 14, 2021
Low fps when doing object detection on jetson nano Jetson Nano jetson-inference	19	9062	March 1, 2022
Nvidia Jetson NX extremely slow even with TensorRT inference for yolov3 Jetson Xavier NX tensorrt	21	2570	October 18, 2021
Why is the _process_yolo_output in the tensorrRT sample code so slow? It takes 0.3 seconds to execut Jetson Nano	15	1236	October 14, 2021
Full Yolov3 on the nano using TensorRT or Deepstream 4.0.1 Jetson Nano	7	2502	October 14, 2021
0.3fps when using yolov3_onnx in TensorRT examples provided by Nvidia in Jetson Nano Jetson Nano	8	1781	October 14, 2021
deepstream-yolo-app performance vs Tensor-Core optimized yolo-darknet DeepStream SDK	9	3646	October 12, 2021

Yolov3 is very slow

Related topics