Get wrong infer results while testing yolov4 on deepstream 5.0

Thanks. I will try to generate YOLOv4 onnx file with pytorch1.4.

But I don’t understand that

There is no suitable DS pipeline for YOLOv4 yet.

I have already implemented yolo layer define and generate engine file, and it runs well. What deepstream need to do is just run engine and infer. Is it right? Or means that DS also do something else, likes preprocessing ect.?

@jiejing_ma

Yes, preprocessing of images is included in DS.

We recommend focusing on the ONNX standard to convert models from other DL frameworks into ONNX first, and then convert into TensorRT engine.

Please pull the latest source from https://github.com/Tianxiaomo/pytorch-YOLOv4 and try to follow section 2, 3, 4, 5 of README on it.

I am now looking into the DS pipeline to check the compatibility of post-processing.

Hey there. Any news on this? We would really like to try YOLO-4 with our DS application.
Benchmarks for YOLO-4 look impressing…

cheers,
Gaylord

@gaylord

Integration solution of YoloV4 and DS is now under development.
Manuals and new code release will be available in the near future.

Hi ersheng,
any news about the integration of YoloV4 and DS? When will it be release?
Thanks

@ersheng so does this mean that yolov5 is also not working because of DeepStream compatibility @CJR says the reason for incorrect results is due to wrong execution of cuda kernels . Do you mind throwing some light on what is the main issue? Thanks

@gaylord @hymanzhu1983 @y14uc339 @jiejing_ma

Current Yolo implementation via CUDA kernel in DeepStream is based on old Yolo models (v2, v3) so it may not suit new Yolo models like YoloV4. Location: /opt/nvidia/deepstream/deepstream-5.0/sources/objectDetector_Yolo/nvdsinfer_custom_impl_Yolo/kernels.cu

We are trying to embed Yolo layer into tensorRT engine before deploying to DeepStream, which would cause Yolo cuda kernel in DeepStream no longer to be used.

We have not officially released YoloV4 solutions for DeepStream yet but you can try following steps:

  1. go to https://github.com/Tianxiaomo/pytorch-YOLOv4 to generate a TensorRT engine according to this workflow: DarkNet or Pytorch --> ONNX --> TensorRT.
  2. Add following C++ functions into objectDetector_Yolo/nvdsinfer_custom_impl_Yolo/nvdsparsebbox_Yolo.cpp and rebuild libnvdsinfer_custom_impl_Yolo.so
  3. Here are configuration files for you as references (You have to update a little to suit your environment):
    config_infer_primary_yoloV4.txt (3.4 KB)
    deepstream_app_config_yoloV4.txt (3.8 KB)
static NvDsInferParseObjectInfo convertBBoxYoloV4(const float& bx1, const float& by1, const float& bx2,
                                     const float& by2, const uint& netW, const uint& netH)
{
    NvDsInferParseObjectInfo b;
    // Restore coordinates to network input resolution

    float x1 = bx1 * netW;
    float y1 = by1 * netH;
    float x2 = bx2 * netW;
    float y2 = by2 * netH;

    x1 = clamp(x1, 0, netW);
    y1 = clamp(y1, 0, netH);
    x2 = clamp(x2, 0, netW);
    y2 = clamp(y2, 0, netH);

    b.left = x1;
    b.width = clamp(x2 - x1, 0, netW);
    b.top = y1;
    b.height = clamp(y2 - y1, 0, netH);

    return b;
}

static void addBBoxProposalYoloV4(const float bx, const float by, const float bw, const float bh,
                     const uint& netW, const uint& netH, const int maxIndex,
                     const float maxProb, std::vector<NvDsInferParseObjectInfo>& binfo)
{
    NvDsInferParseObjectInfo bbi = convertBBoxYoloV4(bx, by, bw, bh, netW, netH);
    if (bbi.width < 1 || bbi.height < 1) return;

    bbi.detectionConfidence = maxProb;
    bbi.classId = maxIndex;
    binfo.push_back(bbi);
}

static std::vector<NvDsInferParseObjectInfo>
decodeYoloV4Tensor(
    const float* boxes, const float* scores,
    const uint num_bboxes, NvDsInferParseDetectionParams const& detectionParams,
    const uint& netW, const uint& netH)
{
    std::vector<NvDsInferParseObjectInfo> binfo;

    uint bbox_location = 0;
    uint score_location = 0;
    for (uint b = 0; b < num_bboxes; ++b)
    {
        float bx1 = boxes[bbox_location];
        float by1 = boxes[bbox_location + 1];
        float bx2 = boxes[bbox_location + 2];
        float by2 = boxes[bbox_location + 3];

        float maxProb = 0.0f;
        int maxIndex = -1;

        for (uint c = 0; c < detectionParams.numClassesConfigured; ++c)
        {
            float prob = scores[score_location + c];
            if (prob > maxProb)
            {
                maxProb = prob;
                maxIndex = c;
            }
        }

        if (maxProb > detectionParams.perClassPreclusterThreshold[maxIndex])
        {
            addBBoxProposalYoloV4(bx1, by1, bx2, by2, netW, netH, maxIndex, maxProb, binfo);
        }

        bbox_location += 4;
        score_location += detectionParams.numClassesConfigured;
    }

    return binfo;
}

static bool NvDsInferParseYoloV4(
    std::vector<NvDsInferLayerInfo> const& outputLayersInfo,
    NvDsInferNetworkInfo const& networkInfo,
    NvDsInferParseDetectionParams const& detectionParams,
    std::vector<NvDsInferParseObjectInfo>& objectList)
{
    if (NUM_CLASSES_YOLO != detectionParams.numClassesConfigured)
    {
        std::cerr << "WARNING: Num classes mismatch. Configured:"
                  << detectionParams.numClassesConfigured
                  << ", detected by network: " << NUM_CLASSES_YOLO << std::endl;
    }

    std::vector<NvDsInferParseObjectInfo> objects;

    const NvDsInferLayerInfo &boxes = outputLayersInfo[0]; // num_boxes x 4
    const NvDsInferLayerInfo &scores = outputLayersInfo[1]; // num_boxes x num_classes

    // 3 dimensional: [num_boxes, 1, 4]
    assert(boxes.inferDims.numDims == 3);
    // 2 dimensional: [num_boxes, num_classes]
    assert(scores.inferDims.numDims == 2);

    // The second dimension should be num_classes
    assert(detectionParams.numClassesConfigured == scores.inferDims.d[1]);
    
    uint num_bboxes = boxes.inferDims.d[0];

    // std::cout << "Network Info: " << networkInfo.height << "  " << networkInfo.width << std::endl;

    std::vector<NvDsInferParseObjectInfo> outObjs =
        decodeYoloV4Tensor(
            (const float*)(boxes.buffer), (const float*)(scores.buffer), num_bboxes, detectionParams,
            networkInfo.width, networkInfo.height);

    objects.insert(objects.end(), outObjs.begin(), outObjs.end());

    objectList = objects;

    return true;
}

4 Likes

@ersheng thanks! But my question was mainly regarding yolov5 compatibility which was released recently!

@y14uc339

YoloV5 may have similar problems too.
However, we have not thoroughly studied compatibilities of YoloV5 yet.
We may add YoloV5 into our agenda soon.

1 Like

Hi @ersheng. Since, DeepStream supports TensorRT and we implemented the cuda kernel for yolov5 which works fine in TensorRT. Why is that cuda kernel not working in DeepStream when DS is using the same TRT. I mean what exactly is causing the problem because here @CJR says that it should work in DS. Any thoughts on this?
Thanks!!

@y14uc339

Highest Yolo version the cuda kernel in /opt/nvidia/deepstream/deepstream-5.0/sources/objectDetector_Yolo/nvdsinfer_custom_impl_Yolo/ can support is YoloV3.

We are trying to embed Yolo layer into tensorRT engine before deploying to DeepStream, which would cause Yolo cuda kernel in DeepStream no longer to be used. You can have a look at my previous post here: YoloV4 Solution.

YoloV5 may have a similar problem and we will work on it applying the same solution. But you can also imitate this YoloV4 solution to solve your YoloV5 problem by yourself.

1 Like

@ersheng this might be a dumb question!! I understand that Highest Yolo version the cuda kernel in /opt/nvidia/deepstream/deepstream-5.0/sources/objectDetector_Yolo/nvdsinfer_custom_impl_Yolo/ can support is YoloV3. BUt I am not using that kernel to implement yolov5 but a different kernel. So, even a different implementation of cuda kernel that works for yolov5 in TRT would not work in DeepStream is that what you are trying to say?

@y14uc339 @CJR

Sorry for the misunderstanding.
CJR is providing you a solution to suit the YoloV5 from https://github.com/wang-xinyu/tensorrtx in this stream, and you can continue to follow this stream.

However, I can give you my suggestions which follows a different workflow:
Pytorch --> ONNX --> TRT
And conversion to ONNX first is a more standardized way to handle YoloV5 from the official page: https://github.com/ultralytics/yolov5.

You can choose either way to solve your problem and I hope they do not clash with each other.

@ersheng Thanks!

@ersheng I’ll try it both ways since @CJR is busy/unavailable currently. I’ll go with Pytorch -> Onnx -> TRT approach. It would be great if you can help out with the custom parsing functions and config files for smooth implementation of yolov5 in TRT!
Thanks

@ersheng Thanks a lot. I try this way and it works!
But there seems have some wrong about results.

And it returns warning info.

WARNING: …/nvdsinfer/nvdsinfer_func_utils.cpp:34 [TRT]: Explicit batch network detected and batch size specified, use enqueue without batch size instead.


Implements

I change the input size to width=320 height=512

And get onnx from Darknet but not pytorch. And set batchsize=1 using this command:

python demo_darknet2onnx.py yolov4.cfg yolov4.weights ./data/dog.jpg 1

onnx2tensorrt

trtexec --onnx=yolov4_1_3_512_320.onnx --explicitBatch --saveEngine=yolov4_1_3_320_512_fp16.engine --workspace=4096 --fp16

Questions

When I set batchsize=4, it gives errors and quit. Does the batchsize have be 1 and input size 320*512? Must I use the Pytorch model? Can the workflow be darknet -> ONNX -> TensoRT?

@jiejing_ma

For the warning

I agree that this warning is annoying but you can now simply ignore it.
It is a historical remaining issue caused by backward compatibility to Caffe and Uff models.
It will be removed in later TensorRT verisons.

For the error

In which step the program quit with error? As I know batch size should be consistent in the workflow: ONNX -> TRT -> DS pipeline:

           batchsize=4  batchsize=4     batchsize=4
Darknet  ->   ONNX   ->   TensorRT  ->  DS pipeline

Have you configured batch size of both [streammux] and [primary-gie]?

For ratio of input

I think the model input ratio should agree with the original image ratio, or at least close to each other.
For example, if your image input is 1080 * 1920, 320 * 512 or 320 * 608 may be a good ratio;
if your image input is 1280 * 1280, then 416 * 416 or 512 * 512 or 608 * 608 may be recommended for the model.

There is an argument named maintain-aspect-ratio in config_infer_primary_yoloV4.txt.
If maintain-aspect-ratio=1, the image will get padded to make its ratio consistent with model input, otherwise, the image will get stretched vertically or horizontally if image ratio does not meet model input.

DarkNet or Pytorch

Convert from darknet to onnx if you just want to use the YoloV4 official pretrained model.
Convert from pytorch to onnx if you want to use the model trained by Pytorch.

1 Like

Hi @jiejing_ma @ersheng
I have implemented Yolov3 with deepstream,but I had a failed attempt with Yolov4.
Can you please share your workflow, and some links which you have referred.
I wish to reproduce your results, [the results you have obtained in the screenshot shared]

I wish to reproduce these . Please help me with a summary or workflow or reference links,
Thanks

@pinktree3

Follow this guidline:
YoloV4: DarkNet or Pytorch -> ONNX -> TensorRT -> DeepStream

1 Like

Hi @ersheng
I followed this link [https://github.com/Tianxiaomo/pytorch-YOLOv4]
I generated the ONNX from the Darknet,
After that, I go to the Nvidia tensorrt container, and the execute the command :

trtexec --onnx=yolov4_1_3_608_608.onnx --explicitBatch --saveEngine=yolov4_1_3_608_608_fp16.engine --workspace=4096 --fp16

I get the following error at the end:
.
.
.
[libprotobuf ERROR google/protobuf/text_format.cc:298] Error parsing text-format onnx2trt_onnx.ModelProto: 1:1: Invalid control characters encountered in text.
[libprotobuf ERROR google/protobuf/text_format.cc:298] Error parsing text-format onnx2trt_onnx.ModelProto: 1:12: Invalid control characters encountered in text.
[libprotobuf ERROR google/protobuf/text_format.cc:298] Error parsing text-format onnx2trt_onnx.ModelProto: 1:14: Message type “onnx2trt_onnx.ModelProto” has no field named “pytorch”.
Failed to parse ONNX model from fileyolov4_1_3_608_608.onnx
[07/09/2020-10:16:02] [E] [TRT] Network must have at least one output
[07/09/2020-10:16:02] [E] [TRT] Network validation failed.
[07/09/2020-10:16:02] [E] Engine creation failed
[07/09/2020-10:16:02] [E] Engine set up failed
&&&& FAILED TensorRT.trtexec # trtexec --onnx=yolov4_1_3_608_608.onnx --explicitBatch --saveEngine=yolov4_1_3_608_608_fp16.engine --workspace=4096 --fp16

any suggestions?