Iplugin tensorrt engine error for ds5.0

Please provide complete information as applicable to your setup.

• Hardware Platform (Jetson / GPU) Tesla T4
• DeepStream Version DS5.0
• JetPack Version (valid for Jetson only)
• TensorRT Version TensorRT 7.0
• NVIDIA GPU Driver Version (valid for GPU only) 440.64

I created a YOLO V4 engine file using tensorRT API and IpluginV2Ext including some custom layers, such as yolo, mish. Then I used the engine in deepstream application with command “deepstream-app -c deepstream_app_config_yoloV4.txt”. Key elements of config_infer_primary_yoloV4.txt are as follows:

model-engine-file=yolov4.engine
labelfile-path=labels.txt

0=FP32, 1=INT8, 2=FP16 mode

network-mode=1
num-detected-classes=5
gie-unique-id=1
network-type=0
is-classifier=0

0=Group Rectangles, 1=DBSCAN, 2=NMS, 3= DBSCAN+NMS Hybrid, 4 = None(No clustering)

cluster-mode=2
maintain-aspect-ratio=1
parse-bbox-func-name=NvDsInferParseCustomYoloV4
custom-lib-path=nvdsinfer_custom_impl_YoloV4/nvdsparsebbox_YoloV4.so

nvdsparsebbox_YoloV4.so just contains implement of bbox parser. Then I got an Error:
ERROR: …/nvdsinfer/nvdsinfer_func_utils.cpp:31 [TRT]: INVALID_ARGUMENT: getPluginCreator could not find plugin mish version 1
ERROR: …/nvdsinfer/nvdsinfer_func_utils.cpp:31 [TRT]: safeDeserializationUtils.cpp (293) - Serialization Error in load: 0 (Cannot deserialize plugin since cfound in Plugin Registry)
ERROR: …/nvdsinfer/nvdsinfer_func_utils.cpp:31 [TRT]: INVALID_STATE: std::exception
ERROR: …/nvdsinfer/nvdsinfer_func_utils.cpp:31 [TRT]: INVALID_CONFIG: Deserialize the cuda engine failed.
ERROR: …/nvdsinfer/nvdsinfer_model_builder.cpp:1452 Deserialize engine failed from file: /opt/nvidia/deepstream/deepstream-5.0/sources/YoloV4/yolov4.engine
0:00:01.533605569 9733 0x55e6de44db30 WARN nvinfer gstnvinfer.cpp:599:gst_nvinfer_logger:<primary_gie> NvDsInferContext[UID 1]: Warning fromzeEngineAndBackend() <nvdsinfer_context_impl.cpp:1566> [UID = 1]: deserialize engine from file :/opt/nvidia/deepstream/deepstream-5.0/sources/YoloV4/yolov4.e
0:00:01.533656600 9733 0x55e6de44db30 WARN nvinfer gstnvinfer.cpp:599:gst_nvinfer_logger:<primary_gie> NvDsInferContext[UID 1]: Warning fromackendContext() <nvdsinfer_context_impl.cpp:1673> [UID = 1]: deserialize backend context from engine from file :/opt/nvidia/deepstream/deepstream-5.0/sourcesy rebuild
0:00:01.533667068 9733 0x55e6de44db30 INFO nvinfer gstnvinfer.cpp:602:gst_nvinfer_logger:<primary_gie> NvDsInferContext[UID 1]: Info from Nv <nvdsinfer_context_impl.cpp:1591> [UID = 1]: Trying to create engine from model files
ERROR: …/nvdsinfer/nvdsinfer_model_builder.cpp:934 failed to build network since there is no model file matched.
ERROR: …/nvdsinfer/nvdsinfer_model_builder.cpp:872 failed to build network.
0:00:01.533940019 9733 0x55e6de44db30 ERROR nvinfer gstnvinfer.cpp:596:gst_nvinfer_logger:<primary_gie> NvDsInferContext[UID 1]: Error in NvD<nvdsinfer_context_impl.cpp:1611> [UID = 1]: build engine file failed
0:00:01.533954442 9733 0x55e6de44db30 ERROR nvinfer gstnvinfer.cpp:596:gst_nvinfer_logger:<primary_gie> NvDsInferContext[UID 1]: Error in NvDndContext() <nvdsinfer_context_impl.cpp:1697> [UID = 1]: build backend context failed
0:00:01.533981272 9733 0x55e6de44db30 ERROR nvinfer gstnvinfer.cpp:596:gst_nvinfer_logger:<primary_gie> NvDsInferContext[UID 1]: Error in NvD<nvdsinfer_context_impl.cpp:1024> [UID = 1]: generate backend failed, check config file settings
0:00:01.534127793 9733 0x55e6de44db30 WARN nvinfer gstnvinfer.cpp:781:gst_nvinfer_start:<primary_gie> error: Failed to create NvDsInferConte
0:00:01.534136619 9733 0x55e6de44db30 WARN nvinfer gstnvinfer.cpp:781:gst_nvinfer_start:<primary_gie> error: Config file path: /opt/nvidia/d/YoloV4/config_infer_primary_yoloV4.txt, NvDsInfer Error: NVDSINFER_CONFIG_FAILED
** ERROR: main:651: Failed to set pipeline to PAUSED
Quitting
ERROR from primary_gie: Failed to create NvDsInferContext instance
Debug info: gstnvinfer.cpp(781): gst_nvinfer_start (): /GstPipeline:pipeline/GstBin:primary_gie_bin/GstNvInfer:primary_gie:
Config file path: /opt/nvidia/deepstream/deepstream-5.0/sources/YoloV4/config_infer_primary_yoloV4.txt, NvDsInfer Error: NVDSINFER_CONFIG_FAILED
App run failed

Hi @sherlocking
Did you provide “custom-lib-path=” for the TRT plugin lib?

you can refer to
sample:
/opt/nvidia/deepstream/deepstream/sources/objectDetector_Yolo/config_infer_primary_yoloV3.txt
doc: https://docs.nvidia.com/metropolis/deepstream/dev-guide/index.html#page/DeepStream%20Plugins%20Development%20Guide/deepstream_plugin_details.3.01.html

BTW, we tried YoloV4, seems it does not need these TRT plugins.

If you are interested, we can share what we did to you.

There is a good way to generate tensorRT YOLOv4 engine without needs of TRT plugins

Step1: Download pretrained YOLOv4 model

Model definition can be downloaded from here
https://raw.githubusercontent.com/AlexeyAB/darknet/master/cfg/yolov4.cfg

Pretrained weights can be downloaded from here
https://github.com/AlexeyAB/darknet/releases/download/darknet_yolo_v3_optimal/yolov4.weights

  • Specify input image size

Open file yolov4.cfg and set values of hight and width at header part of the cfg file

  • Input size options of YoloV4 for inference
Input size Output 1 Output 2 Output 3
Size option 1 3x608x608 255x76x76 255x38x38 255x19x19
Size option 2 3x512x512 255x64x64 255x32x32 255x16x16
Size option 3 3x416x416 255x52x52 255x26x26 255x13x13
Size option 4 3x320x320 255x40x40 255x20x20 255x10x10

Hint: hight and width of input could be different, for example, 608x416, 320x608

Step2: Download a GitHub repository that can help you convert YOLOv4 from darknet to pytorch

git clone https://github.com/Tianxiaomo/pytorch-YOLOv4.git

Step3: Generate onnx model

Create a python file darknet2onnx.py with code as follows, and copy this python script into pytorch-YOLOv4, and then execute it this way:

python darknet2onnx.py <cfgFile> <weightFile> <batchSize>

import sys
import torch
from tool.darknet2pytorch import Darknet


def fransform_to_onnx(cfgfile, weightfile, batch_size):
    model = Darknet(cfgfile)

    model.print_network()
    model.load_weights(weightfile)
    print('Loading weights from %s... Done!' % (weightfile))

    # model.cuda()

    x = torch.randn((batch_size, 3, model.height, model.width), requires_grad=True) #.cuda()

    # Export the model
    print('Export the onnx model ...')
    torch.onnx.export(model,                   
                    x,       
                    "yolov4_{}_3_{}_{}.onnx".format(batch_size, model.height, model.width),
                    export_params=True,
                    opset_version=11,
                    do_constant_folding=True,
                    input_names=['input'], output_names=['output_1', 'output_2', 'output_3'],
                    dynamic_axes=None)

    print('Onnx model exporting done')


if __name__ == '__main__':
    if len(sys.argv) == 4:
        cfgfile = sys.argv[1]
        weightfile = sys.argv[2]
        batch_size = int(sys.argv[3])
        fransform_to_onnx(cfgfile, weightfile, batch_size)
    else:
        print('Please execute this script this way:\n')
        print('python darknet2onnx.py <cfgFile> <weightFile> <batchSize>')

Step4: Transform onnx model into TensorRT model

  • Generate TensorRT engine in fp16 mode:
./trtexec --onnx=<onnx_file> --workspace=4096 --saveEngine=<engine_file> --fp16 --explicitBatch
  • Generate TensorRT engine in int8 mode:
./trtexec --onnx=<onnx_file> --workspace=4096 --saveEngine=<engine_file> --int8 --explicitBatch

Step5: How to train YOLOv4 by your own dataset

The public pretrained YOLOv4 model may not be usefull in specified industrial scenarios.
If you want to re-train this model via Pytorch with your own dataset, just comment or remove the
line model.load_weights(...) from the source, and write your own training code.

1 Like

@ersheng If the model is generated this way, I assume we still need to provide the NvDsInferParseCustomYoloV3 custom lib via

parse-bbox-func-name=NvDsInferParseCustomYoloV3
custom-lib-path=libnvdsinfer_custom_impl_Yolo.so

if its to be used in deepstream
??

@thesyght YOLOv4 seems to use almost the same yolo layer as YOLOv3 but some extra arguments, according to the paper and darknet definitions .
You can try nvdsinfer_custom_impl_Yolo for YOLOv4, but I cannot guarantee the compatibility.
Inform us if there are errors reported.

@ersheng ersheng

I try to follow these steps to generate yolov4 engine on Jetson NX ,I can generate yolov4.onnx successful
but I meet below error when run
./trtexec --onnx=yolov4_1_3_608_608.onnx --workspace=4096 --saveEngine=yolov4_fp16.engine --fp16 --explicitBatch


Input filename: yolov4_1_3_608_608.onnx
ONNX IR version: 0.0.6
Opset version: 11
Producer name: pytorch
Producer version: 1.5
Domain:
Model version: 0
Doc string:

[05/26/2020-23:43:50] [W] [TRT] onnx2trt_utils.cpp:217: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[05/26/2020-23:43:50] [W] [TRT] Calling isShapeTensor before the entire network is constructed may result in an inaccurate result.
[05/26/2020-23:43:50] [W] [TRT] Calling isShapeTensor before the entire network is constructed may result in an inaccurate result.
[05/26/2020-23:43:50] [E] [TRT] Layer: (Unnamed Layer* 426)[Select]'s output can not be used as shape tensor.
[05/26/2020-23:43:50] [E] [TRT] Network validation failed.
[05/26/2020-23:43:50] [E] Engine creation failed
[05/26/2020-23:43:50] [E] Engine set up failed
&&&& FAILED TensorRT.trtexec # ./trtexec --onnx=yolov4_1_3_608_608.onnx --workspace=4096 --saveEngine=yolov4_fp16.engine --fp16 --explicitBatch

I have no idea about this.

@ashingtsai

  • Code in this repository is still under enhancements, many people including myself are contributing to it.
  • Try to pull the latest source and generate ONNX again.
  • Inform me if problem still persists.

I re-download the latest code and try it.
the result is the same on Jetson NX
but I can build it successeful on PC RTX2070 super with tensorRT7.0.0
the Jetson NX tensorRT version is 7.1.0

@ashingtsai
I have tried Jetson AGX + TensorRT 7.1.0, there is no error as you mentioned reported.
It seems there are shape layers in the generated ONNX.
shape layers are not expected to exist in ONNX generated from https://github.com/Tianxiaomo/pytorch-YOLOv4.
Are there any updates in your local YOLOv4 model?
Do you mind providing your ONNX file so that we can check what are the differences?

@ersheng

you can download my .oonx from below linker.
It is original Yolov4.cfg and yolov4.weight from [https://raw.githubusercontent.com/AlexeyAB]

1 Like

The ONNX module of pytorch 1.5 seems to behave differently from earlier pytorch versions while dealing with constant parameters for expand operations.
Try to generate onnx file with pytorch 1.4 or pytorch 1.3.

Please see compatible pytorch version in TensorRT 7 release note: https://docs.nvidia.com/deeplearning/tensorrt/release-notes/tensorrt-7.html

Pytorch && ONNX are evolving quickly and we are trying best to catch up.
Tell me if tensorRT reports error again.

@ersheng

yes,I changed to use pytorch V1.4 then can get yolov4.engine successeful from onnx.
then,How to inference yolov4.engine ? in ./trtexec ?
and how to inference it with DLA?
python example is better.C++ also is fine.

@ashingtsai @jiejing_ma

We have not finalize inference methods for YOLOv4 yet.
But you can try to debug and run the following piece of python script alongside with TianXiaomo’s repository: https://github.com/Tianxiaomo/pytorch-YOLOv4 :

trt_demo.py  <engine_file> <image_file> <input_H> <input_W>

However, this TensorRT demo still depends on TianXiaomo’s python functions to do the post-processing that would be a bottle net for inference.

Boosted post-processing for YOLOv4 is still under development.

# trt_demo.py
import sys
import os
import argparse
import numpy as np
import cv2
from PIL import Image
import common
import tensorrt as trt

from tool.utils import *

TRT_LOGGER = trt.Logger()

def main(engine_path, image_path, image_size):
    with get_engine(engine_path) as engine, engine.create_execution_context() as context:
        buffers = common.allocate_buffers(engine)
        image_src = cv2.imread(image_path)

        detect(engine, context, buffers, image_src, image_size)


def get_engine(engine_path):
    # If a serialized engine exists, use it instead of building an engine.
    print("Reading engine from file {}".format(engine_path))
    with open(engine_path, "rb") as f, trt.Runtime(TRT_LOGGER) as runtime:
        return runtime.deserialize_cuda_engine(f.read())



def detect(engine, context, buffers, image_src, image_size):
    IN_IMAGE_H, IN_IMAGE_W = image_size

    # Input
    resized = cv2.resize(image_src, (IN_IMAGE_W, IN_IMAGE_H), interpolation=cv2.INTER_LINEAR)
    img_in = cv2.cvtColor(resized, cv2.COLOR_BGR2RGB)
    img_in = np.transpose(img_in, (2, 0, 1)).astype(np.float32)
    img_in = np.expand_dims(img_in, axis=0)
    img_in /= 255.0
    img_in = np.ascontiguousarray(img_in)
    print("Shape of the network input: ", img_in.shape)
    # print(img_in)

    inputs, outputs, bindings, stream = buffers
    print('Length of inputs: ', len(inputs))
    inputs[0].host = img_in

    trt_outputs = common.do_inference(context, bindings=bindings, inputs=inputs, outputs=outputs, stream=stream)

    print('Shape of outputs: ')
    print(trt_outputs[0].shape)
    print(trt_outputs[1].shape)
    print(trt_outputs[2].shape)

    trt_outputs[0] = trt_outputs[0].reshape(-1, 255, IN_IMAGE_H // 8, IN_IMAGE_W // 8)
    trt_outputs[1] = trt_outputs[1].reshape(-1, 255, IN_IMAGE_H // 16, IN_IMAGE_W // 16)
    trt_outputs[2] = trt_outputs[2].reshape(-1, 255, IN_IMAGE_H // 32, IN_IMAGE_W // 32)

    print('Shapes supposed to be: ')
    print(trt_outputs[0].shape)
    print(trt_outputs[1].shape)
    print(trt_outputs[2].shape)

    # print(outputs[2])
    num_classes = 80

    boxes = post_processing(img_in, 0.4, num_classes, 0.5, trt_outputs)

    if num_classes == 20:
        namesfile = 'data/voc.names'
    elif num_classes == 80:
        namesfile = 'data/coco.names'
    else:
        namesfile = 'data/names'

    class_names = load_class_names(namesfile)
    plot_boxes_cv2(image_src, boxes, savename='predictions_trt.jpg', class_names=class_names)



if __name__ == '__main__':
    engine_path = sys.argv[1]
    image_path = sys.argv[2]
    
    if len(sys.argv) < 4:
        image_size = (416, 416)
    elif len(sys.argv) < 5:
        image_size = (int(sys.argv[3]), int(sys.argv[3]))
    else:
        image_size = (int(sys.argv[3]), int(sys.argv[4]))
    
    main(engine_path, image_path, image_size)

@ashingtsai @jiejing_ma

Since https://github.com/Tianxiaomo/pytorch-YOLOv4 is now evolving quickly with my contributions, my previous post has been phased out already.
Please pull from the latest https://github.com/Tianxiaomo/pytorch-YOLOv4 and follow guidelines in README.
If you have problems with conversions and inferences, do not hesitate to inform me.

1 Like

@ersheng

I run below command but got error.
python3 demo_trt.py yolov4.engine dog.jpg

Reading engine from file yolov4.engine
Shape of the network input: (1, 3, 416, 416)
Length of inputs: 1
Len of outputs: 9
Shapes supposed to be:
(8112,)
(32448,)
(648960,)
(2028,)
(8112,)
(162240,)
(507,)
(40560,)
(2028,)
Traceback (most recent call last):
File “demo_trt.py”, line 228, in
main(engine_path, image_path, image_size)
File “demo_trt.py”, line 117, in main
detect(engine, context, buffers, image_src, image_size)
File “demo_trt.py”, line 184, in detect
trt_outputs[1].reshape(-1, 3 * h1 * w1, 80),
ValueError: cannot reshape array of size 32448 into shape (8112,80)

@ashingtsai
Order of outputs is often randomized.
I have tried to solve this problem and you can pull the latest code.

@ersheng

then it seems got inf value in box.

python3 demo_trt.py yolov4.engine dog.jpg
Reading engine from file yolov4.engine
Shape of the network input: (1, 3, 416, 416)
Length of inputs: 1
Len of outputs: 9
/home/ashing/pytorch-YOLOv4/tool/utils.py:39: RuntimeWarning: invalid value encountered in add
My = max(box1[1] + box1[3] / 2.0, box2[1] + box2[3] / 2.0)
Traceback (most recent call last):
File “demo_trt.py”, line 235, in
main(engine_path, image_path, image_size)
File “demo_trt.py”, line 117, in main
detect(engine, context, buffers, image_src, image_size)
File “demo_trt.py”, line 220, in detect
plot_boxes_cv2(image_src, boxes, savename=‘predictions_trt.jpg’, class_names=class_names)
File “/home/ashing/pytorch-YOLOv4/tool/utils.py”, line 490, in plot_boxes_cv2
y1 = int((box[1] - box[3] / 2.0) * height)
OverflowError: cannot convert float infinity to integer

@ashingtsai
exp() in yolo layer may cause infinite values.
Send me your input image (dog.jpg) and let me reproduce this fault.
Or maybe you can try this by yourself.

In the second last line of function yolo_forward:
from

boxes = torch.cat((xmin, ymin, xmax, ymax), dim=2)

to

boxes = torch.cat((xmin, ymin, xmax, ymax), dim=2).clamp(-10.0, 10.0)

@ersheng

I try to change to
boxes = torch.cat((xmin, ymin, xmax, ymax), dim=2).clamp(-10.0, 10.0)
but no any change ,still appear the same error.
and the dog.jpg is from pytorch-YOLOv4/data/dog.jpg
my pytorch version is 1.4