Options for optimising custom TFRT tiny YOLOv4 implementation to improve live inference speed on Nano

I’ve created a custom tiny YOLOv4 Tensorflow RT model, which I’m running on a 4GB Nano development board using this Python repository, for live inference.

My input size is 416x416 (needed because I’m trying to detect relatively small objects within the frame), and I’m using 8-bit integers. The rest of my setup details can be seen in this jtop output:

I’m only managing to achieve a maximum throughput of 2-2.3fps, and need to improve this to at least 15fps.

I was wondering if there are any further steps I can take (beyond using TFRT, tiny YOLO and 8-bit integers) to improve the fps on the Nano, and what speed improvements I might expect by moving up to one of the more powerful Jetson boards?


Please check the following sample for optimizing YOLOv4 with TensorRT (TensorRT+Deepstream).

We can get around 57.75 fps with a video input/output pipeline on Xavier.
It’s recommended to give the sample a try on Nano first.


Thanks - I successfully completed up to step 3.1 on the suggested repository.

But, when I try to run step 3.2 (run make to compile nvdsparsebbox_Yolo.cpp in directory nvdsinfer_custom_impl_Yolo), I get the following error:

g++ -c -o nvdsinfer_yolo_engine.o -Wall -std=c++11 -shared -fPIC -Wno-error=deprecated-declarations -I../../includes -I/usr/local/cuda-10.2.89/include nvdsinfer_yolo_engine.cpp
nvdsinfer_yolo_engine.cpp:23:10: fatal error: nvdsinfer_custom_impl.h: No such file or directory
 #include "nvdsinfer_custom_impl.h"
compilation terminated.
Makefile:49: recipe for target 'nvdsinfer_yolo_engine.o' failed
make: *** [nvdsinfer_yolo_engine.o] Error 1

I found another post suggesting there might be a problem with my DeepStream Installation?

I followed the steps listed here to install Deepstream, in case that helps.

Also, I found a copy of nvdsinfer_custom_impl.h in /opt/nvidia/deepstream/deepstream-5.0/sources/includes/.


Could you check the Makefile in the folder first?
Since it links the Deepstream header with a relative path, you need to update it if coping it elsewhere.


diff --git a/Makefile b/Makefile
index 8b85b86..8eafdbd 100644
--- a/Makefile
+++ b/Makefile
@@ -28,7 +28,7 @@ CC:= g++
 CFLAGS:= -Wall -std=c++11 -shared -fPIC -Wno-error=deprecated-declarations
-CFLAGS+= -I../../includes -I/usr/local/cuda-$(CUDA_VER)/include
+CFLAGS+= -I/opt/nvidia/deepstream/deepstream-5.0/sources/includes -I/usr/local/cuda-$(CUDA_VER)/include
 LIBS:= -lnvinfer_plugin -lnvinfer -lnvparsers -L/usr/local/cuda-$(CUDA_VER)/lib64 -lcudart -lcublas -lstdc++fs
 LFLAGS:= -shared -Wl,--start-group $(LIBS) -Wl,--end-group


Fantastic - thanks very much! I now have a working model, running at 35fps, which is great.

But, I get Num classes mismatch. Configured:2, detected by network: 80 warnings when it’s running.

My original darknet cfg file definitely had only 2 classes, as does labels.txt. And Deepstream fails to run the model if I set num-detected-classes=80 in config_infer_primary_yoloV4.txt.

Can you suggest anywhere else where the mismatch might have occurred?


There are two place need to be updated for a customized #class.

1. Configure. ex. config_infer_primary_yoloV3_tiny.txt


2. Parser: nvdsparsebbox_Yolo.cpp

#include "trt_utils.h"

static const int NUM_CLASSES_YOLO = 80;