Get wrong infer results while testing yolov4 on deepstream 5.0

Hi ersheng,
any news about the integration of YoloV4 and DS? When will it be release?

@ersheng so does this mean that yolov5 is also not working because of DeepStream compatibility @CJR says the reason for incorrect results is due to wrong execution of cuda kernels . Do you mind throwing some light on what is the main issue? Thanks

@gaylord @hymanzhu1983 @y14uc339 @jiejing_ma

Current Yolo implementation via CUDA kernel in DeepStream is based on old Yolo models (v2, v3) so it may not suit new Yolo models like YoloV4. Location: /opt/nvidia/deepstream/deepstream-5.0/sources/objectDetector_Yolo/nvdsinfer_custom_impl_Yolo/

We are trying to embed Yolo layer into tensorRT engine before deploying to DeepStream, which would cause Yolo cuda kernel in DeepStream no longer to be used.

We have not officially released YoloV4 solutions for DeepStream yet but you can try following steps:

  1. go to to generate a TensorRT engine according to this workflow: DarkNet or Pytorch → ONNX → TensorRT.
  2. Add following C++ functions into objectDetector_Yolo/nvdsinfer_custom_impl_Yolo/nvdsparsebbox_Yolo.cpp and rebuild
  3. Here are configuration files for you as references (You have to update a little to suit your environment):
    config_infer_primary_yoloV4.txt (3.4 KB)
    deepstream_app_config_yoloV4.txt (3.8 KB)
static NvDsInferParseObjectInfo convertBBoxYoloV4(const float& bx1, const float& by1, const float& bx2,
                                     const float& by2, const uint& netW, const uint& netH)
    NvDsInferParseObjectInfo b;
    // Restore coordinates to network input resolution

    float x1 = bx1 * netW;
    float y1 = by1 * netH;
    float x2 = bx2 * netW;
    float y2 = by2 * netH;

    x1 = clamp(x1, 0, netW);
    y1 = clamp(y1, 0, netH);
    x2 = clamp(x2, 0, netW);
    y2 = clamp(y2, 0, netH);

    b.left = x1;
    b.width = clamp(x2 - x1, 0, netW); = y1;
    b.height = clamp(y2 - y1, 0, netH);

    return b;

static void addBBoxProposalYoloV4(const float bx, const float by, const float bw, const float bh,
                     const uint& netW, const uint& netH, const int maxIndex,
                     const float maxProb, std::vector<NvDsInferParseObjectInfo>& binfo)
    NvDsInferParseObjectInfo bbi = convertBBoxYoloV4(bx, by, bw, bh, netW, netH);
    if (bbi.width < 1 || bbi.height < 1) return;

    bbi.detectionConfidence = maxProb;
    bbi.classId = maxIndex;

static std::vector<NvDsInferParseObjectInfo>
    const float* boxes, const float* scores,
    const uint num_bboxes, NvDsInferParseDetectionParams const& detectionParams,
    const uint& netW, const uint& netH)
    std::vector<NvDsInferParseObjectInfo> binfo;

    uint bbox_location = 0;
    uint score_location = 0;
    for (uint b = 0; b < num_bboxes; ++b)
        float bx1 = boxes[bbox_location];
        float by1 = boxes[bbox_location + 1];
        float bx2 = boxes[bbox_location + 2];
        float by2 = boxes[bbox_location + 3];

        float maxProb = 0.0f;
        int maxIndex = -1;

        for (uint c = 0; c < detectionParams.numClassesConfigured; ++c)
            float prob = scores[score_location + c];
            if (prob > maxProb)
                maxProb = prob;
                maxIndex = c;

        if (maxProb > detectionParams.perClassPreclusterThreshold[maxIndex])
            addBBoxProposalYoloV4(bx1, by1, bx2, by2, netW, netH, maxIndex, maxProb, binfo);

        bbox_location += 4;
        score_location += detectionParams.numClassesConfigured;

    return binfo;

extern "C" bool NvDsInferParseCustomYoloV4(
    std::vector<NvDsInferLayerInfo> const& outputLayersInfo,
    NvDsInferNetworkInfo const& networkInfo,
    NvDsInferParseDetectionParams const& detectionParams,
    std::vector<NvDsInferParseObjectInfo>& objectList)
    if (NUM_CLASSES_YOLO != detectionParams.numClassesConfigured)
        std::cerr << "WARNING: Num classes mismatch. Configured:"
                  << detectionParams.numClassesConfigured
                  << ", detected by network: " << NUM_CLASSES_YOLO << std::endl;

    std::vector<NvDsInferParseObjectInfo> objects;

    const NvDsInferLayerInfo &boxes = outputLayersInfo[0]; // num_boxes x 4
    const NvDsInferLayerInfo &scores = outputLayersInfo[1]; // num_boxes x num_classes

    // 3 dimensional: [num_boxes, 1, 4]
    assert(boxes.inferDims.numDims == 3);
    // 2 dimensional: [num_boxes, num_classes]
    assert(scores.inferDims.numDims == 2);

    // The second dimension should be num_classes
    assert(detectionParams.numClassesConfigured == scores.inferDims.d[1]);
    uint num_bboxes = boxes.inferDims.d[0];

    // std::cout << "Network Info: " << networkInfo.height << "  " << networkInfo.width << std::endl;

    std::vector<NvDsInferParseObjectInfo> outObjs =
            (const float*)(boxes.buffer), (const float*)(scores.buffer), num_bboxes, detectionParams,
            networkInfo.width, networkInfo.height);

    objects.insert(objects.end(), outObjs.begin(), outObjs.end());

    objectList = objects;

    return true;


@ersheng thanks! But my question was mainly regarding yolov5 compatibility which was released recently!


YoloV5 may have similar problems too.
However, we have not thoroughly studied compatibilities of YoloV5 yet.
We may add YoloV5 into our agenda soon.

1 Like

Hi @ersheng. Since, DeepStream supports TensorRT and we implemented the cuda kernel for yolov5 which works fine in TensorRT. Why is that cuda kernel not working in DeepStream when DS is using the same TRT. I mean what exactly is causing the problem because here @CJR says that it should work in DS. Any thoughts on this?


Highest Yolo version the cuda kernel in /opt/nvidia/deepstream/deepstream-5.0/sources/objectDetector_Yolo/nvdsinfer_custom_impl_Yolo/ can support is YoloV3.

We are trying to embed Yolo layer into tensorRT engine before deploying to DeepStream, which would cause Yolo cuda kernel in DeepStream no longer to be used. You can have a look at my previous post here: YoloV4 Solution.

YoloV5 may have a similar problem and we will work on it applying the same solution. But you can also imitate this YoloV4 solution to solve your YoloV5 problem by yourself.

1 Like

@ersheng this might be a dumb question!! I understand that Highest Yolo version the cuda kernel in /opt/nvidia/deepstream/deepstream-5.0/sources/objectDetector_Yolo/nvdsinfer_custom_impl_Yolo/ can support is YoloV3. BUt I am not using that kernel to implement yolov5 but a different kernel. So, even a different implementation of cuda kernel that works for yolov5 in TRT would not work in DeepStream is that what you are trying to say?

@y14uc339 @CJR

Sorry for the misunderstanding.
CJR is providing you a solution to suit the YoloV5 from in this stream, and you can continue to follow this stream.

However, I can give you my suggestions which follows a different workflow:
Pytorch → ONNX → TRT
And conversion to ONNX first is a more standardized way to handle YoloV5 from the official page:

You can choose either way to solve your problem and I hope they do not clash with each other.

@ersheng Thanks!

@ersheng I’ll try it both ways since @CJR is busy/unavailable currently. I’ll go with Pytorch → Onnx → TRT approach. It would be great if you can help out with the custom parsing functions and config files for smooth implementation of yolov5 in TRT!

@ersheng Thanks a lot. I try this way and it works!
But there seems have some wrong about results.

And it returns warning info.

WARNING: …/nvdsinfer/nvdsinfer_func_utils.cpp:34 [TRT]: Explicit batch network detected and batch size specified, use enqueue without batch size instead.


I change the input size to width=320 height=512

And get onnx from Darknet but not pytorch. And set batchsize=1 using this command:

python yolov4.cfg yolov4.weights ./data/dog.jpg 1


trtexec --onnx=yolov4_1_3_512_320.onnx --explicitBatch --saveEngine=yolov4_1_3_320_512_fp16.engine --workspace=4096 --fp16


When I set batchsize=4, it gives errors and quit. Does the batchsize have be 1 and input size 320*512? Must I use the Pytorch model? Can the workflow be darknet → ONNX → TensoRT?


For the warning

I agree that this warning is annoying but you can now simply ignore it.
It is a historical remaining issue caused by backward compatibility to Caffe and Uff models.
It will be removed in later TensorRT verisons.

For the error

In which step the program quit with error? As I know batch size should be consistent in the workflow: ONNX → TRT → DS pipeline:

           batchsize=4  batchsize=4     batchsize=4
Darknet  ->   ONNX   ->   TensorRT  ->  DS pipeline

Have you configured batch size of both [streammux] and [primary-gie]?

For ratio of input

I think the model input ratio should agree with the original image ratio, or at least close to each other.
For example, if your image input is 1080 * 1920, 320 * 512 or 320 * 608 may be a good ratio;
if your image input is 1280 * 1280, then 416 * 416 or 512 * 512 or 608 * 608 may be recommended for the model.

There is an argument named maintain-aspect-ratio in config_infer_primary_yoloV4.txt.
If maintain-aspect-ratio=1, the image will get padded to make its ratio consistent with model input, otherwise, the image will get stretched vertically or horizontally if image ratio does not meet model input.

DarkNet or Pytorch

Convert from darknet to onnx if you just want to use the YoloV4 official pretrained model.
Convert from pytorch to onnx if you want to use the model trained by Pytorch.


Hi @jiejing_ma @ersheng
I have implemented Yolov3 with deepstream,but I had a failed attempt with Yolov4.
Can you please share your workflow, and some links which you have referred.
I wish to reproduce your results, [the results you have obtained in the screenshot shared]

I wish to reproduce these . Please help me with a summary or workflow or reference links,


Follow this guidline:
YoloV4: DarkNet or Pytorch → ONNX → TensorRT → DeepStream

1 Like

Hi @ersheng
I followed this link [GitHub - Tianxiaomo/pytorch-YOLOv4: PyTorch ,ONNX and TensorRT implementation of YOLOv4]
I generated the ONNX from the Darknet,
After that, I go to the Nvidia tensorrt container, and the execute the command :

trtexec --onnx=yolov4_1_3_608_608.onnx --explicitBatch --saveEngine=yolov4_1_3_608_608_fp16.engine --workspace=4096 --fp16

I get the following error at the end:
[libprotobuf ERROR google/protobuf/] Error parsing text-format onnx2trt_onnx.ModelProto: 1:1: Invalid control characters encountered in text.
[libprotobuf ERROR google/protobuf/] Error parsing text-format onnx2trt_onnx.ModelProto: 1:12: Invalid control characters encountered in text.
[libprotobuf ERROR google/protobuf/] Error parsing text-format onnx2trt_onnx.ModelProto: 1:14: Message type “onnx2trt_onnx.ModelProto” has no field named “pytorch”.
Failed to parse ONNX model from fileyolov4_1_3_608_608.onnx
[07/09/2020-10:16:02] [E] [TRT] Network must have at least one output
[07/09/2020-10:16:02] [E] [TRT] Network validation failed.
[07/09/2020-10:16:02] [E] Engine creation failed
[07/09/2020-10:16:02] [E] Engine set up failed
&&&& FAILED TensorRT.trtexec # trtexec --onnx=yolov4_1_3_608_608.onnx --explicitBatch --saveEngine=yolov4_1_3_608_608_fp16.engine --workspace=4096 --fp16

any suggestions?


What versions of Pytorch and TensorRT are you using?

I did configure batch size of both [streammux] and [primary-gie]. I will try agine and upload the error info later. Thanks

Hi, @pinktree3
post # 12 is useful. I just refered to it.
For the error you met, you can check your version of pytorch and tensorrt.


Pytorch 1.4.0 for TensorRT 7.0 and higher
Pytorch 1.5.0 and 1.6.0 for TensorRT 7.1.2 and higher


TensorRT version Recommended: 7.0, 7.1

thanks @ersheng @jiejing_ma