Trouble in converting non square grid in YOLO Network to tensorrt via DeepStream

Thank you @eh-steve. You save my days :)

Hi,
How can I merge you code with the default code of deepstream?
Can you explain me please

thanks

You just need to save the patch as a file on your system, then from within the /opt/nvidia/deepstream/deepstream-5.0/sources directory, just do git apply /path/to/where/you/saved/the.patch

Then rebuild

HI,

I got the error

warning: objectDetector_Yolo/nvdsinfer_custom_impl_Yolo/kernels.cu has type 100644, expected 100755
warning: objectDetector_Yolo/nvdsinfer_custom_impl_Yolo/nvdsparsebbox_Yolo.cpp has type 100644, expected 100755
error: patch failed: objectDetector_Yolo/nvdsinfer_custom_impl_Yolo/nvdsparsebbox_Yolo.cpp:29
error: objectDetector_Yolo/nvdsinfer_custom_impl_Yolo/nvdsparsebbox_Yolo.cpp: patch does not apply
warning: objectDetector_Yolo/nvdsinfer_custom_impl_Yolo/trt_utils.cpp has type 100644, expected 100755
warning: objectDetector_Yolo/nvdsinfer_custom_impl_Yolo/trt_utils.h has type 100644, expected 100755
error: patch failed: objectDetector_Yolo/nvdsinfer_custom_impl_Yolo/trt_utils.h:46
error: objectDetector_Yolo/nvdsinfer_custom_impl_Yolo/trt_utils.h: patch does not apply
warning: objectDetector_Yolo/nvdsinfer_custom_impl_Yolo/yolo.cpp has type 100644, expected 100755
error: patch failed: objectDetector_Yolo/nvdsinfer_custom_impl_Yolo/yolo.cpp:173
error: objectDetector_Yolo/nvdsinfer_custom_impl_Yolo/yolo.cpp: patch does not apply
warning: objectDetector_Yolo/nvdsinfer_custom_impl_Yolo/yolo.h has type 100644, expected 100755
warning: objectDetector_Yolo/nvdsinfer_custom_impl_Yolo/yoloPlugins.cpp has type 100644, expected 100755
warning: objectDetector_Yolo/nvdsinfer_custom_impl_Yolo/yoloPlugins.h has type 100644, expected 100755

what I need to do?

Getting the same errors ron1232 reported applying the patch on DS 5.0 (NGC 20.07).

Hey @horacio.vico and @ron1232

Iā€™ve updated the patch for the sources of DS 5.0 GA (and removed permission bits from the patch).

Just git apply from /opt/nvidia/deepstream/deepstream-5.0/sources/objectDetector_Yolo:

    Index: objectDetector_Yolo/nvdsinfer_custom_impl_Yolo/kernels.cu
    ===================================================================
    --- objectDetector_Yolo/nvdsinfer_custom_impl_Yolo/kernels.cu	(revision 997117c37973bc002080dadd6d605b2ec9a1dfe3)
    +++ objectDetector_Yolo/nvdsinfer_custom_impl_Yolo/kernels.cu	(revision b58e25ca9603312687a4cc2b1febfa2f4204a89c)
    @@ -17,20 +17,20 @@
     
     inline __device__ float sigmoidGPU(const float& x) { return 1.0f / (1.0f + __expf(-x)); }
     
    -__global__ void gpuYoloLayerV3(const float* input, float* output, const uint gridSize, const uint numOutputClasses,
    +__global__ void gpuYoloLayerV3(const float* input, float* output, const uint gridSizeX, const uint gridSizeY, const uint numOutputClasses,
                                    const uint numBBoxes)
     {
         uint x_id = blockIdx.x * blockDim.x + threadIdx.x;
         uint y_id = blockIdx.y * blockDim.y + threadIdx.y;
         uint z_id = blockIdx.z * blockDim.z + threadIdx.z;
     
    -    if ((x_id >= gridSize) || (y_id >= gridSize) || (z_id >= numBBoxes))
    +    if ((x_id >= gridSizeX) || (y_id >= gridSizeY) || (z_id >= numBBoxes))
         {
             return;
         }
     
    -    const int numGridCells = gridSize * gridSize;
    -    const int bbindex = y_id * gridSize + x_id;
    +    const int numGridCells = gridSizeX * gridSizeY;
    +    const int bbindex = y_id * gridSizeX + x_id;
     
         output[bbindex + numGridCells * (z_id * (5 + numOutputClasses) + 0)]
             = sigmoidGPU(input[bbindex + numGridCells * (z_id * (5 + numOutputClasses) + 0)]);
    @@ -54,23 +54,23 @@
         }
     }
     
    -cudaError_t cudaYoloLayerV3(const void* input, void* output, const uint& batchSize, const uint& gridSize,
    +cudaError_t cudaYoloLayerV3(const void* input, void* output, const uint& batchSize, const uint& gridSizeX, const uint& gridSizeY,
                                 const uint& numOutputClasses, const uint& numBBoxes,
                                 uint64_t outputSize, cudaStream_t stream);
     
    -cudaError_t cudaYoloLayerV3(const void* input, void* output, const uint& batchSize, const uint& gridSize,
    +cudaError_t cudaYoloLayerV3(const void* input, void* output, const uint& batchSize, const uint& gridSizeX, const uint& gridSizeY,
                                 const uint& numOutputClasses, const uint& numBBoxes,
                                 uint64_t outputSize, cudaStream_t stream)
     {
         dim3 threads_per_block(16, 16, 4);
    -    dim3 number_of_blocks((gridSize / threads_per_block.x) + 1,
    -                          (gridSize / threads_per_block.y) + 1,
    +    dim3 number_of_blocks((gridSizeX / threads_per_block.x) + 1,
    +                          (gridSizeY / threads_per_block.y) + 1,
                               (numBBoxes / threads_per_block.z) + 1);
         for (unsigned int batch = 0; batch < batchSize; ++batch)
         {
             gpuYoloLayerV3<<<number_of_blocks, threads_per_block, 0, stream>>>(
                 reinterpret_cast<const float*>(input) + (batch * outputSize),
    -            reinterpret_cast<float*>(output) + (batch * outputSize), gridSize, numOutputClasses,
    +            reinterpret_cast<float*>(output) + (batch * outputSize), gridSizeX, gridSizeY, numOutputClasses,
                 numBBoxes);
         }
         return cudaGetLastError();
    Index: objectDetector_Yolo/nvdsinfer_custom_impl_Yolo/nvdsparsebbox_Yolo.cpp
    ===================================================================
    --- objectDetector_Yolo/nvdsinfer_custom_impl_Yolo/nvdsparsebbox_Yolo.cpp	(revision 997117c37973bc002080dadd6d605b2ec9a1dfe3)
    +++ objectDetector_Yolo/nvdsinfer_custom_impl_Yolo/nvdsparsebbox_Yolo.cpp	(revision b58e25ca9603312687a4cc2b1febfa2f4204a89c)
    @@ -30,7 +30,7 @@
     #include "nvdsinfer_custom_impl.h"
     #include "trt_utils.h"
     
    -static const int NUM_CLASSES_YOLO = 80;
    +static const int NUM_CLASSES_YOLO = 13;
     
     extern "C" bool NvDsInferParseCustomYoloV3(
         std::vector<NvDsInferLayerInfo> const& outputLayersInfo,
    @@ -300,8 +300,8 @@
             10, 14, 23, 27, 37, 58, 81, 82, 135, 169, 344, 319};
         static const std::vector<std::vector<int>> kMASKS = {
             {3, 4, 5},
    -        //{0, 1, 2}}; // as per output result, select {1,2,3}
    -        {1, 2, 3}};
    +        {0, 1, 2}}; // as per output result, select {1,2,3}
    +//        {1, 2, 3}};
     
         return NvDsInferParseYoloV3 (
             outputLayersInfo, networkInfo, detectionParams, objectList,
    Index: objectDetector_Yolo/nvdsinfer_custom_impl_Yolo/trt_utils.cpp
    ===================================================================
    --- objectDetector_Yolo/nvdsinfer_custom_impl_Yolo/trt_utils.cpp	(revision 997117c37973bc002080dadd6d605b2ec9a1dfe3)
    +++ objectDetector_Yolo/nvdsinfer_custom_impl_Yolo/trt_utils.cpp	(revision b58e25ca9603312687a4cc2b1febfa2f4204a89c)
    @@ -373,19 +373,19 @@
         assert(block.at("type") == "upsample");
         nvinfer1::Dims inpDims = input->getDimensions();
         assert(inpDims.nbDims == 3);
    -    assert(inpDims.d[1] == inpDims.d[2]);
    +//    assert(inpDims.d[1] == inpDims.d[2]);
         int h = inpDims.d[1];
         int w = inpDims.d[2];
         int stride = std::stoi(block.at("stride"));
         // add pre multiply matrix as a constant
         nvinfer1::Dims preDims{3,
    -                           {1, stride * h, w},
    +                           {1, stride * h, h},
                                {nvinfer1::DimensionType::kCHANNEL, nvinfer1::DimensionType::kSPATIAL,
                                 nvinfer1::DimensionType::kSPATIAL}};
    -    int size = stride * h * w;
    +    int size = stride * h * h;
         nvinfer1::Weights preMul{nvinfer1::DataType::kFLOAT, nullptr, size};
         float* preWt = new float[size];
    -    /* (2*h * w)
    +    /* (2*h * h)
         [ [1, 0, ..., 0],
           [1, 0, ..., 0],
           [0, 1, ..., 0],
    @@ -397,12 +397,9 @@
         */
         for (int i = 0, idx = 0; i < h; ++i)
         {
    -        for (int s = 0; s < stride; ++s)
    +        for (int j = 0; j < h * stride; ++j, ++idx)
             {
    -            for (int j = 0; j < w; ++j, ++idx)
    -            {
    -                preWt[idx] = (i == j) ? 1.0 : 0.0;
    -            }
    +            preWt[idx] = (i == j) ? 1.0 : 0.0;
             }
         }
         preMul.values = preWt;
    @@ -413,20 +410,20 @@
         preM->setName(preLayerName.c_str());
         // add post multiply matrix as a constant
         nvinfer1::Dims postDims{3,
    -                            {1, h, stride * w},
    +                            {1, w, stride * w},
                                 {nvinfer1::DimensionType::kCHANNEL, nvinfer1::DimensionType::kSPATIAL,
                                  nvinfer1::DimensionType::kSPATIAL}};
    -    size = stride * h * w;
    +    size = stride * w * w;
         nvinfer1::Weights postMul{nvinfer1::DataType::kFLOAT, nullptr, size};
         float* postWt = new float[size];
    -    /* (h * 2*w)
    +    /* (w * 2*w)
         [ [1, 1, 0, 0, ..., 0, 0],
           [0, 0, 1, 1, ..., 0, 0],
           ...,
           ...,
           [0, 0, 0, 0, ..., 1, 1] ]
         */
    -    for (int i = 0, idx = 0; i < h; ++i)
    +    for (int i = 0, idx = 0; i < w; ++i)
         {
             for (int j = 0; j < stride * w; ++j, ++idx)
             {
    Index: objectDetector_Yolo/nvdsinfer_custom_impl_Yolo/yolo.cpp
    ===================================================================
    --- objectDetector_Yolo/nvdsinfer_custom_impl_Yolo/yolo.cpp	(revision 997117c37973bc002080dadd6d605b2ec9a1dfe3)
    +++ objectDetector_Yolo/nvdsinfer_custom_impl_Yolo/yolo.cpp	(revision b58e25ca9603312687a4cc2b1febfa2f4204a89c)
    @@ -161,19 +161,21 @@
                 printLayerInfo(layerIndex, "skip", inputVol, outputVol, "    -");
             } else if (m_ConfigBlocks.at(i).at("type") == "yolo") {
                 nvinfer1::Dims prevTensorDims = previous->getDimensions();
    -            assert(prevTensorDims.d[1] == prevTensorDims.d[2]);
    +//            assert(prevTensorDims.d[1] == prevTensorDims.d[2]);
                 TensorInfo& curYoloTensor = m_OutputTensors.at(outputTensorCount);
    -            curYoloTensor.gridSize = prevTensorDims.d[1];
    -            curYoloTensor.stride = m_InputW / curYoloTensor.gridSize;
    -            m_OutputTensors.at(outputTensorCount).volume = curYoloTensor.gridSize
    -                * curYoloTensor.gridSize
    +            curYoloTensor.gridSizeY = prevTensorDims.d[1];
    +            curYoloTensor.gridSizeX = prevTensorDims.d[2];
    +            curYoloTensor.stride = m_InputH / curYoloTensor.gridSizeY;
    +            m_OutputTensors.at(outputTensorCount).volume = curYoloTensor.gridSizeY
    +                * curYoloTensor.gridSizeX
                     * (curYoloTensor.numBBoxes * (5 + curYoloTensor.numClasses));
                 std::string layerName = "yolo_" + std::to_string(i);
                 curYoloTensor.blobName = layerName;
                 nvinfer1::IPluginV2* yoloPlugin
                     = new YoloLayerV3(m_OutputTensors.at(outputTensorCount).numBBoxes,
                                       m_OutputTensors.at(outputTensorCount).numClasses,
    -                                  m_OutputTensors.at(outputTensorCount).gridSize);
    +                                  m_OutputTensors.at(outputTensorCount).gridSizeX,
    +                                  m_OutputTensors.at(outputTensorCount).gridSizeY);
                 assert(yoloPlugin != nullptr);
                 nvinfer1::IPluginV2Layer* yolo =
                     network.addPluginV2(&previous, 1, *yoloPlugin);
    @@ -193,10 +195,11 @@
                 nvinfer1::Dims prevTensorDims = previous->getDimensions();
                 assert(prevTensorDims.d[1] == prevTensorDims.d[2]);
                 TensorInfo& curRegionTensor = m_OutputTensors.at(outputTensorCount);
    -            curRegionTensor.gridSize = prevTensorDims.d[1];
    -            curRegionTensor.stride = m_InputW / curRegionTensor.gridSize;
    -            m_OutputTensors.at(outputTensorCount).volume = curRegionTensor.gridSize
    -                * curRegionTensor.gridSize
    +            curRegionTensor.gridSizeY = prevTensorDims.d[1];
    +            curRegionTensor.gridSizeX = prevTensorDims.d[2];
    +            curRegionTensor.stride = m_InputW / curRegionTensor.gridSizeX;
    +            m_OutputTensors.at(outputTensorCount).volume = curRegionTensor.gridSizeX
    +                * curRegionTensor.gridSizeY
                     * (curRegionTensor.numBBoxes * (5 + curRegionTensor.numClasses));
                 std::string layerName = "region_" + std::to_string(i);
                 curRegionTensor.blobName = layerName;
    @@ -376,7 +379,7 @@
                 m_InputH = std::stoul(block.at("height"));
                 m_InputW = std::stoul(block.at("width"));
                 m_InputC = std::stoul(block.at("channels"));
    -            assert(m_InputW == m_InputH);
    +//            assert(m_InputW == m_InputH);
                 m_InputSize = m_InputC * m_InputH * m_InputW;
             }
             else if ((block.at("type") == "region") || (block.at("type") == "yolo"))
    Index: objectDetector_Yolo/nvdsinfer_custom_impl_Yolo/yolo.h
    ===================================================================
    --- objectDetector_Yolo/nvdsinfer_custom_impl_Yolo/yolo.h	(revision 997117c37973bc002080dadd6d605b2ec9a1dfe3)
    +++ objectDetector_Yolo/nvdsinfer_custom_impl_Yolo/yolo.h	(revision b58e25ca9603312687a4cc2b1febfa2f4204a89c)
    @@ -52,7 +52,8 @@
     {
         std::string blobName;
         uint stride{0};
    -    uint gridSize{0};
    +    uint gridSizeY{0};
    +    uint gridSizeX{0};
         uint numClasses{0};
         uint numBBoxes{0};
         uint64_t volume{0};
    Index: objectDetector_Yolo/nvdsinfer_custom_impl_Yolo/yoloPlugins.cpp
    ===================================================================
    --- objectDetector_Yolo/nvdsinfer_custom_impl_Yolo/yoloPlugins.cpp	(revision 997117c37973bc002080dadd6d605b2ec9a1dfe3)
    +++ objectDetector_Yolo/nvdsinfer_custom_impl_Yolo/yoloPlugins.cpp	(revision b58e25ca9603312687a4cc2b1febfa2f4204a89c)
    @@ -45,7 +45,7 @@
     // Forward declaration of cuda kernels
     cudaError_t cudaYoloLayerV3 (
         const void* input, void* output, const uint& batchSize,
    -    const uint& gridSize, const uint& numOutputClasses,
    +    const uint& gridSizeX, const uint& gridSizeY, const uint& numOutputClasses,
         const uint& numBBoxes, uint64_t outputSize, cudaStream_t stream);
     
     YoloLayerV3::YoloLayerV3 (const void* data, size_t length)
    @@ -53,20 +53,23 @@
         const char *d = static_cast<const char*>(data);
         read(d, m_NumBoxes);
         read(d, m_NumClasses);
    -    read(d, m_GridSize);
    +    read(d, m_GridSizeX);
    +    read(d, m_GridSizeY);
         read(d, m_OutputSize);
     };
     
     YoloLayerV3::YoloLayerV3 (
    -    const uint& numBoxes, const uint& numClasses, const uint& gridSize) :
    +    const uint& numBoxes, const uint& numClasses, const uint& gridSizeX, const uint& gridSizeY) :
         m_NumBoxes(numBoxes),
         m_NumClasses(numClasses),
    -    m_GridSize(gridSize)
    +    m_GridSizeX(gridSizeX),
    +    m_GridSizeY(gridSizeY)
     {
         assert(m_NumBoxes > 0);
         assert(m_NumClasses > 0);
    -    assert(m_GridSize > 0);
    -    m_OutputSize = m_GridSize * m_GridSize * (m_NumBoxes * (4 + 1 + m_NumClasses));
    +    assert(m_GridSizeX > 0);
    +    assert(m_GridSizeY > 0);
    +    m_OutputSize = m_GridSizeX * m_GridSizeY * (m_NumBoxes * (4 + 1 + m_NumClasses));
     };
     
     nvinfer1::Dims
    @@ -100,14 +103,14 @@
         cudaStream_t stream)
     {
         CHECK(cudaYoloLayerV3(
    -              inputs[0], outputs[0], batchSize, m_GridSize, m_NumClasses, m_NumBoxes,
    +              inputs[0], outputs[0], batchSize, m_GridSizeX, m_GridSizeY, m_NumClasses, m_NumBoxes,
                   m_OutputSize, stream));
         return 0;
     }
     
     size_t YoloLayerV3::getSerializationSize() const
     {
    -    return sizeof(m_NumBoxes) + sizeof(m_NumClasses) + sizeof(m_GridSize) + sizeof(m_OutputSize);
    +    return sizeof(m_NumBoxes) + sizeof(m_NumClasses) + sizeof(m_GridSizeX) + sizeof(m_GridSizeY) + sizeof(m_OutputSize);
     }
     
     void YoloLayerV3::serialize(void* buffer) const
    @@ -115,13 +118,14 @@
         char *d = static_cast<char*>(buffer);
         write(d, m_NumBoxes);
         write(d, m_NumClasses);
    -    write(d, m_GridSize);
    +    write(d, m_GridSizeX);
    +    write(d, m_GridSizeY);
         write(d, m_OutputSize);
     }
     
     nvinfer1::IPluginV2* YoloLayerV3::clone() const
     {
    -    return new YoloLayerV3 (m_NumBoxes, m_NumClasses, m_GridSize);
    +    return new YoloLayerV3 (m_NumBoxes, m_NumClasses, m_GridSizeX, m_GridSizeY);
     }
     
     REGISTER_TENSORRT_PLUGIN(YoloLayerV3PluginCreator);
    Index: objectDetector_Yolo/nvdsinfer_custom_impl_Yolo/yoloPlugins.h
    ===================================================================
    --- objectDetector_Yolo/nvdsinfer_custom_impl_Yolo/yoloPlugins.h	(revision 997117c37973bc002080dadd6d605b2ec9a1dfe3)
    +++ objectDetector_Yolo/nvdsinfer_custom_impl_Yolo/yoloPlugins.h	(revision b58e25ca9603312687a4cc2b1febfa2f4204a89c)
    @@ -51,7 +51,7 @@
     {
     public:
         YoloLayerV3 (const void* data, size_t length);
    -    YoloLayerV3 (const uint& numBoxes, const uint& numClasses, const uint& gridSize);
    +    YoloLayerV3 (const uint& numBoxes, const uint& numClasses, const uint& gridSizeX, const uint& gridSizeY);
         const char* getPluginType () const override { return YOLOV3LAYER_PLUGIN_NAME; }
         const char* getPluginVersion () const override { return YOLOV3LAYER_PLUGIN_VERSION; }
         int getNbOutputs () const override { return 1; }
    @@ -89,7 +89,8 @@
     private:
         uint m_NumBoxes {0};
         uint m_NumClasses {0};
    -    uint m_GridSize {0};
    +    uint m_GridSizeX {0};
    +    uint m_GridSizeY {0};
         uint64_t m_OutputSize {0};
         std::string m_Namespace {""};
     };
1 Like

Thank you very much!

any finds on dealing with this?

also, how do anchor indices differ between pjreddie and alexeyabā€™s implementations?

Thank you for all your help steve!

Hi,

Iā€™m using custom trained asymmetric models which actually seem to perform better under TRT than under darknet, so Iā€™ve had no reason to dig into this (as I donā€™t use the 416x416 COCO models).

As for the anchor boxes, the Deepstream sources hardcode the anchor indices to be {0,1,2}, {3,4,5} or
{1,2,3} {3,4,5}. You need to change these to match whichever indices the model was trained with.

Iā€™ve since made changes to the Deepstream sources to remove any hardcoded anchors and instead retrieve them from the model.cfg file. If Nvidia had a github repo for DS, I would open a PR thereā€¦

1 Like

@eh-steve Iā€™m using models exactly like that so Iā€™d be very surprised if I could actually make them improved better on TRT than on darknet (they were originally developed with AlexeyABā€™s fork of darknet in mind).

Yeah itā€™d be nice if Nvidia open sourced at least customizable parts of deepstream like this but I suppose that for a lot of reasons thatā€™s not gonna happen.

@marcoslucianops recently released his non-square version after I opened an issue on his custom deepstream-yolo implementation (i think he even followed some of your patches guidelines to make it work but as he mentioned above the accuracy wasnā€™t better but rather worse after applying his custom implementation. Iā€™d suggest (and kindly ask) you take a look and feel free to contribute to his no-square branch here (the repo has been referenced on darknetā€™s readme for a month or two already) : https://github.com/marcoslucianops/DeepStream-Yolo/tree/master/non_square/nvdsinfer_custom_impl_Yolo

I think more people like you could help to support this especially now since nvidia included yolov4 in the recent deepstream5.1 update but still doesnā€™t support custom dimensionsā€¦

Thanks again for all your help and insight on this

DeepStream for YOLO models (v3, v4) with support for non square models and INT8 calibration:

1 Like