Yolov3's inference too heavy for Jetson Nano?

I referred to the following article at Jetson Nano,
Certainly 20FPS is displayed on the console when deepstream-app (yolov3) is executed.

https://devtalk.nvidia.com/default/topic/1064871/deepstream-sdk/deepstream-gst-nvstreammux-change-width-and-height-doesn-t-affect-fps/post/5392823/#5392823

I have made changes specified the comment above in these config files and we can achieve a throughput of 20 FPS.

However, I was wondering why the video playback of this demo was not smooth
Following the example of osd_sink_pad_buffer_probe, the detected objects are enumerated every frame.

Then, by setting interval = 5, I noticed that inference was executed only once every 5 frames.
And without the interval, we were able to infer that all frames were inferred, but it was around 3 FPS.

You are writing “we can achieve a throughput of 20 FPS.”
It’s true. This is not to say that “20FPS yolov3 inference” is possible.

It is that? Is this recognition accurate?

The following comment in the article says “Its trade-off that needs to be tuned for your use-case.”
Do you mean that?

After all, is yolov3’s inference too heavy for Jetson Nano?
Is there any other way to speed up Jetson Nano’s Yolov3 inference?
(No tiny model is required)

Please make sure to change the height and width to 416 in yolov3.cfg before generating the engine file.
If the tracking results are bad for your test video, you can reduce the interval to improve the accuracy
further but the FPS will drop as well. Its trade-off that needs to be tuned for your use-case.

Hi,

YOLOv3 is a complicated model for mobile system.
It has around 140.69 Bn FLOPS while the tiny version just has 5.56Bn.

The interval parameter decides how often the detector need to be applied on the stream.
As you already know, set interval=5 indicates do inference once every 5 frames.

After that, the bounding box is assigned to non-detected frame with our feature tracker.
We provide several tracker algorithm, like IOU, KLT, DCF, which make the bbox assignment more accurate.

This is kind of trade-off.
We know some of complicated model cannot reach real-time performance on the embedded system.
That’s why we develop serveral tracking algorithm to let them can be used in practical.

How long the detector should be applied depends on the moving speed of your object/scenario.
For example, you can set a larger interval value for a video conference usecase.

Our suggestion is to try our different tracking algorithm to see if it can meet your requirement or not.
Another possible improvement is to set the inference mode into fp16.

Thanks.

Thank you AastaLLL. I’ve understood about this issue.