Options for optimising custom TFRT tiny YOLOv4 implementation to improve live inference speed on Nano

paul55 · January 14, 2021, 11:06am

I’ve created a custom tiny YOLOv4 Tensorflow RT model, which I’m running on a 4GB Nano development board using this Python repository, for live inference.

My input size is 416x416 (needed because I’m trying to detect relatively small objects within the frame), and I’m using 8-bit integers. The rest of my setup details can be seen in this jtop output:

I’m only managing to achieve a maximum throughput of 2-2.3fps, and need to improve this to at least 15fps.

I was wondering if there are any further steps I can take (beyond using TFRT, tiny YOLO and 8-bit integers) to improve the fps on the Nano, and what speed improvements I might expect by moving up to one of the more powerful Jetson boards?

AastaLLL · January 18, 2021, 2:32am

Hi,

Please check the following sample for optimizing YOLOv4 with TensorRT (TensorRT+Deepstream).

We can get around 57.75 fps with a video input/output pipeline on Xavier.
It’s recommended to give the sample a try on Nano first.

Thanks.

paul55 · January 29, 2021, 3:08pm

Thanks - I successfully completed up to step 3.1 on the suggested repository.

But, when I try to run step 3.2 (run make to compile nvdsparsebbox_Yolo.cpp in directory nvdsinfer_custom_impl_Yolo), I get the following error:

g++ -c -o nvdsinfer_yolo_engine.o -Wall -std=c++11 -shared -fPIC -Wno-error=deprecated-declarations -I../../includes -I/usr/local/cuda-10.2.89/include nvdsinfer_yolo_engine.cpp
nvdsinfer_yolo_engine.cpp:23:10: fatal error: nvdsinfer_custom_impl.h: No such file or directory
 #include "nvdsinfer_custom_impl.h"
          ^~~~~~~~~~~~~~~~~~~~~~~~~
compilation terminated.
Makefile:49: recipe for target 'nvdsinfer_yolo_engine.o' failed
make: *** [nvdsinfer_yolo_engine.o] Error 1

I found another post suggesting there might be a problem with my DeepStream Installation?

I followed the steps listed here to install Deepstream, in case that helps.

Also, I found a copy of nvdsinfer_custom_impl.h in /opt/nvidia/deepstream/deepstream-5.0/sources/includes/.

AastaLLL · February 5, 2021, 4:01am

Hi,

Could you check the Makefile in the folder first?
Since it links the Deepstream header with a relative path, you need to update it if coping it elsewhere.

/opt/nvidia/deepstream/deepstream-5.0/sources/objectDetector_Yolo/nvdsinfer_custom_impl_Yolo/Makefile

diff --git a/Makefile b/Makefile
index 8b85b86..8eafdbd 100644
--- a/Makefile
+++ b/Makefile
@@ -28,7 +28,7 @@ CC:= g++
 NVCC:=/usr/local/cuda-$(CUDA_VER)/bin/nvcc
 
 CFLAGS:= -Wall -std=c++11 -shared -fPIC -Wno-error=deprecated-declarations
-CFLAGS+= -I../../includes -I/usr/local/cuda-$(CUDA_VER)/include
+CFLAGS+= -I/opt/nvidia/deepstream/deepstream-5.0/sources/includes -I/usr/local/cuda-$(CUDA_VER)/include
 
 LIBS:= -lnvinfer_plugin -lnvinfer -lnvparsers -L/usr/local/cuda-$(CUDA_VER)/lib64 -lcudart -lcublas -lstdc++fs
 LFLAGS:= -shared -Wl,--start-group $(LIBS) -Wl,--end-group

Thanks.

paul55 · February 10, 2021, 12:31pm

Fantastic - thanks very much! I now have a working model, running at 35fps, which is great.

But, I get Num classes mismatch. Configured:2, detected by network: 80 warnings when it’s running.

My original darknet cfg file definitely had only 2 classes, as does labels.txt. And Deepstream fails to run the model if I set num-detected-classes=80 in config_infer_primary_yoloV4.txt.

Can you suggest anywhere else where the mismatch might have occurred?

AastaLLL · March 2, 2021, 6:59am

Hi,

There are two place need to be updated for a customized #class.

1. Configure. ex. config_infer_primary_yoloV3_tiny.txt

[property]
...
num-detected-classes=80

2. Parser: nvdsparsebbox_Yolo.cpp

..
#include "trt_utils.h"

static const int NUM_CLASSES_YOLO = 80;
...

Thanks.

Topic		Replies	Views
Python wrapper for tensorrt implementation of Yolo (currently v2) Jetson Nano	32	8051	July 2, 2020
Low fps when doing object detection on jetson nano Jetson Nano jetson-inference	19	9083	March 1, 2022
Recommendation of an existing application for object detection and tracking with jetson Nano, YOLO V3 Tiny and Tensorflow Jetson Nano tensorflow	17	1859	October 12, 2021
YOLO-POSE Demo Accelerated with Deepstream and TensorRT(Pose Estimation with DeepStream Python Binding) DeepStream SDK tensorrt , yolo , demos-and-tutorials , deepstream	2	1482	April 26, 2024
Classes mismatch error on Deepstream DeepStream SDK nano	6	1428	October 12, 2021
Running YOloV4 on jetson Nano at Higher FPS? Jetson TX2 yolo	8	10515	October 18, 2021
What kind of hardware rigs can support 100+ videos analytics using deepstream? DeepStream SDK hw	30	1834	October 12, 2021
Instructions to integrate TAO 3.0 YoloV4 model into DeepStream produce no output on Jetson NX DeepStream SDK	10	403	December 5, 2023
Deepstream 5.0 Inference Accuracy Lower than Native Darnket DeepStream SDK	3	479	September 25, 2020
run yolov3-tiny with tensorRT model Jetson Nano	7	3437	January 4, 2020

Options for optimising custom TFRT tiny YOLOv4 implementation to improve live inference speed on Nano

Related topics