YoloV4 with OpenCV

Hello experts,

Need your opinion. I am testing YoloV4 with OpenCV4.4 compiled with CUDA and cuDNN on JP 4.4. With tiny yolo I am getting close to 2fps when inferring every frame on Nano. Its pretty straight forward to implement/integrate in C++ if you want to use Yolo with OpenCV. Other option is to use TensorRT as nvidia recommends. However, the implementation of Yolo using TensorRT is not as straight forward as OpenCV. So, my question is what benefits that I may expect if I choose TensorRT path instead of OpenCV using Darknet?

Hi,

We do have an example for YOLOv4 that inferences with TensorRT.
You can check it directly and see if you want to use TensorRT or not.

By the way, you can also use TensorRT to replace darknet inferencing but remain OpenCV for camera.
An example can be found here:

Thanks.

Have a look at this blog:

We are getting about 12fps on the Nano with a custom Tiny-Yolo-V4 Model, after following the tutorials in that post.

I am trying yolov4_deepstream that you mentioned in your post and currently trying to darknet2onnx to convert yolov4 to onnx. However, I am not being able to install onnxruntime using $pip install onnxruntime. It is saying,

$ pip install onnxruntime
Collecting onnxruntime
Could not find a version that satisfies the requirement onnxruntime (from versions: )
No matching distribution found for onnxruntime

I am using JP4.4. Also, what I am expecting following this step is to produce yolov4.onnx file from darknet yolov4 weights. So the .py that I need to run is the one in tool directory (darknet2onnx.py). Please correct me if I am wrong.

Seems like if you want to use onnxruntime on the Jetson you will need to get a docker image or install it from Nvidia servers not the default PyPl Package index :

Might also worth just doing the conversion on a different machine.

Thank you for your reply.

I am having difficulty in following the steps describe in the link below and would appreciate if somebody could clarify.

My target is to run YoloV4 using TensorRT on Nano using C++ (not Python). I have compiled TensorRT OSS Plugin (libnvinfer_plugin.so.7.1.3) on Nano and replaced the original one in “/usr/lib/aarch64-linux-gnu/”.

My next step is to generate yolov4.onnx. Since I could not install onnxruntime, I downloaded a readymade yolov4.onnx file from https://github.com/onnx/models/tree/master/vision/object_detection_segmentation/yolov4.

As the downloaded yolov4.onnx does not include BatchedNMSPlugin (I assume) my next step as I understood is to do Step 2 of section 2.3 described in https://github.com/NVIDIA-AI-IOT/yolov4_deepstream/blob/master/tensorrt_yolov4/README.md. However, I probably need to do this in a different PC as I can’t install onnxruntime in Nano. Alternatively, if anybody could send me a link from where I can download yolov4.onnx with BatchedNMSPlugin that would also work for me.

Now, if I go to section 3 of https://github.com/NVIDIA-AI-IOT/yolov4_deepstream/blob/master/tensorrt_yolov4/README.md, it confuses me totally.
To compile and build it says to go to
cd <dir_on_your_machine>/yolov4_sample/yolo_cpp_standalone/source_gpu_nms

where is this directory?

What I will be ultimately needing for my purpose as I see is the SampleYolo Class and use my own wrapper to integrate into my program to run in realtime using a camera. So for that as I understood I need the libnvinfer_plugin.so.7.1.3 (for YoloV4) which I already did, I downloaded yolov4.onnx (without BatchedNMSPlugin and I think I need to include that, need help on how to do that) and probably that’s it.

Please let me know if anything is missing in my understanding. Thanks

UPDATE:
The downloaded yolov4.onnx seems wont work, as it is saying

[11/05/2020-13:27:41] [W] [TRT] “onnx2trt_utils.cpp:220: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.”
[11/05/2020-13:27:41] [I] Building TensorRT engine…/data/yolov4.engine
[11/05/2020-13:27:46] [E] [TRT] Network has dynamic or shape inputs, but no optimization profile has been defined.
[11/05/2020-13:27:46] [E] [TRT] Network validation failed.

UPDATE:

I tried to build and run standalone tensorrt_yolov4 using the following link:

Everything went smoothly, I did the following steps

  1. Download and install TensorRT 7.1.3.4
  2. Download TensorRT OSS, compiled and replaced libnvinfer_plugin.so.7.1.3
  3. Generate yolov4 ONNX using https://github.com/Tianxiaomo/pytorch-YOLOv4 (Step 1)
  4. Added BatchedNMSPlugin into yolov4 ONNX model using https://github.com/Tianxiaomo/pytorch-YOLOv4 (Step 2)

Next I build the yolov4 standalone program as described in https://github.com/NVIDIA-AI-IOT/yolov4_deepstream/blob/master/tensorrt_yolov4/README.md (section 3.1 to 3.4)

Now, I am trying to execute yolov4 (section 3.5) and the following error I am receining:

&&&& RUNNING TensorRT.sample_yolo # ./yolov4 --fp16
There are 0 coco images to process
[11/09/2020-14:02:36] [I] Building and running a GPU inference engine for Yolo
[11/09/2020-14:02:37] [I] Parsing ONNX file: …/data/yolov4_1_3_416_416.onnx.nms.onnx
[11/09/2020-14:02:37] [W] [TRT] onnx2trt_utils.cpp:220: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[11/09/2020-14:02:37] [W] [TRT] onnx2trt_utils.cpp:246: One or more weights outside the range of INT32 was clamped
[11/09/2020-14:02:37] [W] [TRT] onnx2trt_utils.cpp:246: One or more weights outside the range of INT32 was clamped
[11/09/2020-14:02:37] [W] [TRT] onnx2trt_utils.cpp:246: One or more weights outside the range of INT32 was clamped
[11/09/2020-14:02:37] [W] [TRT] onnx2trt_utils.cpp:246: One or more weights outside the range of INT32 was clamped
[11/09/2020-14:02:37] [W] [TRT] onnx2trt_utils.cpp:246: One or more weights outside the range of INT32 was clamped
[11/09/2020-14:02:37] [W] [TRT] onnx2trt_utils.cpp:246: One or more weights outside the range of INT32 was clamped
[11/09/2020-14:02:37] [I] [TRT] ModelImporter.cpp:135: No importer registered for op: BatchedNMS_TRT. Attempting to import as plugin.
[11/09/2020-14:02:37] [I] [TRT] builtin_op_importers.cpp:3659: Searching for plugin: BatchedNMS_TRT, plugin_version: 1, plugin_namespace:
[11/09/2020-14:02:37] [I] [TRT] builtin_op_importers.cpp:3676: Successfully created plugin: BatchedNMS_TRT
[11/09/2020-14:02:37] [W] [TRT] Output type must be INT32 for shape outputs
[11/09/2020-14:02:37] [W] [TRT] Output type must be INT32 for shape outputs
[11/09/2020-14:02:37] [W] [TRT] Output type must be INT32 for shape outputs
[11/09/2020-14:02:37] [W] [TRT] Output type must be INT32 for shape outputs
[11/09/2020-14:02:37] [I] Building TensorRT engine…/data/yolov4_1_3_416_416.engine
[11/09/2020-14:02:37] [W] [TRT] Half2 support requested on hardware without native FP16 support, performance will be negatively affected.
[11/09/2020-14:02:38] [E] [TRT] …/rtSafe/safeRuntime.cpp (25) - Cuda Error in allocate: 2 (out of memory)
[11/09/2020-14:02:38] [E] [TRT] …/rtSafe/safeRuntime.cpp (25) - Cuda Error in allocate: 2 (out of memory)
&&&& FAILED TensorRT.sample_yolo # ./yolov4 --fp16

Any ideas?
BTW, I am using GTX 1050.

Solved the issue by reducing the GPU memory usage from 4GB (default) to 1GB for Nano in SampleYolo.cpp as follows

config->setMaxWorkspaceSize(4096_MiB);

I had to change the above line to

config->setMaxWorkspaceSize(1024_MiB);

Thanks to everyone for your feedback.