Thank you @eh-steve. You save my days :)
Hi,
How can I merge you code with the default code of deepstream?
Can you explain me please
thanks
You just need to save the patch as a file on your system, then from within the /opt/nvidia/deepstream/deepstream-5.0/sources
directory, just do git apply /path/to/where/you/saved/the.patch
Then rebuild
HI,
I got the error
warning: objectDetector_Yolo/nvdsinfer_custom_impl_Yolo/kernels.cu has type 100644, expected 100755
warning: objectDetector_Yolo/nvdsinfer_custom_impl_Yolo/nvdsparsebbox_Yolo.cpp has type 100644, expected 100755
error: patch failed: objectDetector_Yolo/nvdsinfer_custom_impl_Yolo/nvdsparsebbox_Yolo.cpp:29
error: objectDetector_Yolo/nvdsinfer_custom_impl_Yolo/nvdsparsebbox_Yolo.cpp: patch does not apply
warning: objectDetector_Yolo/nvdsinfer_custom_impl_Yolo/trt_utils.cpp has type 100644, expected 100755
warning: objectDetector_Yolo/nvdsinfer_custom_impl_Yolo/trt_utils.h has type 100644, expected 100755
error: patch failed: objectDetector_Yolo/nvdsinfer_custom_impl_Yolo/trt_utils.h:46
error: objectDetector_Yolo/nvdsinfer_custom_impl_Yolo/trt_utils.h: patch does not apply
warning: objectDetector_Yolo/nvdsinfer_custom_impl_Yolo/yolo.cpp has type 100644, expected 100755
error: patch failed: objectDetector_Yolo/nvdsinfer_custom_impl_Yolo/yolo.cpp:173
error: objectDetector_Yolo/nvdsinfer_custom_impl_Yolo/yolo.cpp: patch does not apply
warning: objectDetector_Yolo/nvdsinfer_custom_impl_Yolo/yolo.h has type 100644, expected 100755
warning: objectDetector_Yolo/nvdsinfer_custom_impl_Yolo/yoloPlugins.cpp has type 100644, expected 100755
warning: objectDetector_Yolo/nvdsinfer_custom_impl_Yolo/yoloPlugins.h has type 100644, expected 100755
what I need to do?
Getting the same errors ron1232 reported applying the patch on DS 5.0 (NGC 20.07).
Hey @horacio.vico and @ron1232
Iāve updated the patch for the sources of DS 5.0 GA (and removed permission bits from the patch).
Just git apply
from /opt/nvidia/deepstream/deepstream-5.0/sources/objectDetector_Yolo
:
Index: objectDetector_Yolo/nvdsinfer_custom_impl_Yolo/kernels.cu
===================================================================
--- objectDetector_Yolo/nvdsinfer_custom_impl_Yolo/kernels.cu (revision 997117c37973bc002080dadd6d605b2ec9a1dfe3)
+++ objectDetector_Yolo/nvdsinfer_custom_impl_Yolo/kernels.cu (revision b58e25ca9603312687a4cc2b1febfa2f4204a89c)
@@ -17,20 +17,20 @@
inline __device__ float sigmoidGPU(const float& x) { return 1.0f / (1.0f + __expf(-x)); }
-__global__ void gpuYoloLayerV3(const float* input, float* output, const uint gridSize, const uint numOutputClasses,
+__global__ void gpuYoloLayerV3(const float* input, float* output, const uint gridSizeX, const uint gridSizeY, const uint numOutputClasses,
const uint numBBoxes)
{
uint x_id = blockIdx.x * blockDim.x + threadIdx.x;
uint y_id = blockIdx.y * blockDim.y + threadIdx.y;
uint z_id = blockIdx.z * blockDim.z + threadIdx.z;
- if ((x_id >= gridSize) || (y_id >= gridSize) || (z_id >= numBBoxes))
+ if ((x_id >= gridSizeX) || (y_id >= gridSizeY) || (z_id >= numBBoxes))
{
return;
}
- const int numGridCells = gridSize * gridSize;
- const int bbindex = y_id * gridSize + x_id;
+ const int numGridCells = gridSizeX * gridSizeY;
+ const int bbindex = y_id * gridSizeX + x_id;
output[bbindex + numGridCells * (z_id * (5 + numOutputClasses) + 0)]
= sigmoidGPU(input[bbindex + numGridCells * (z_id * (5 + numOutputClasses) + 0)]);
@@ -54,23 +54,23 @@
}
}
-cudaError_t cudaYoloLayerV3(const void* input, void* output, const uint& batchSize, const uint& gridSize,
+cudaError_t cudaYoloLayerV3(const void* input, void* output, const uint& batchSize, const uint& gridSizeX, const uint& gridSizeY,
const uint& numOutputClasses, const uint& numBBoxes,
uint64_t outputSize, cudaStream_t stream);
-cudaError_t cudaYoloLayerV3(const void* input, void* output, const uint& batchSize, const uint& gridSize,
+cudaError_t cudaYoloLayerV3(const void* input, void* output, const uint& batchSize, const uint& gridSizeX, const uint& gridSizeY,
const uint& numOutputClasses, const uint& numBBoxes,
uint64_t outputSize, cudaStream_t stream)
{
dim3 threads_per_block(16, 16, 4);
- dim3 number_of_blocks((gridSize / threads_per_block.x) + 1,
- (gridSize / threads_per_block.y) + 1,
+ dim3 number_of_blocks((gridSizeX / threads_per_block.x) + 1,
+ (gridSizeY / threads_per_block.y) + 1,
(numBBoxes / threads_per_block.z) + 1);
for (unsigned int batch = 0; batch < batchSize; ++batch)
{
gpuYoloLayerV3<<<number_of_blocks, threads_per_block, 0, stream>>>(
reinterpret_cast<const float*>(input) + (batch * outputSize),
- reinterpret_cast<float*>(output) + (batch * outputSize), gridSize, numOutputClasses,
+ reinterpret_cast<float*>(output) + (batch * outputSize), gridSizeX, gridSizeY, numOutputClasses,
numBBoxes);
}
return cudaGetLastError();
Index: objectDetector_Yolo/nvdsinfer_custom_impl_Yolo/nvdsparsebbox_Yolo.cpp
===================================================================
--- objectDetector_Yolo/nvdsinfer_custom_impl_Yolo/nvdsparsebbox_Yolo.cpp (revision 997117c37973bc002080dadd6d605b2ec9a1dfe3)
+++ objectDetector_Yolo/nvdsinfer_custom_impl_Yolo/nvdsparsebbox_Yolo.cpp (revision b58e25ca9603312687a4cc2b1febfa2f4204a89c)
@@ -30,7 +30,7 @@
#include "nvdsinfer_custom_impl.h"
#include "trt_utils.h"
-static const int NUM_CLASSES_YOLO = 80;
+static const int NUM_CLASSES_YOLO = 13;
extern "C" bool NvDsInferParseCustomYoloV3(
std::vector<NvDsInferLayerInfo> const& outputLayersInfo,
@@ -300,8 +300,8 @@
10, 14, 23, 27, 37, 58, 81, 82, 135, 169, 344, 319};
static const std::vector<std::vector<int>> kMASKS = {
{3, 4, 5},
- //{0, 1, 2}}; // as per output result, select {1,2,3}
- {1, 2, 3}};
+ {0, 1, 2}}; // as per output result, select {1,2,3}
+// {1, 2, 3}};
return NvDsInferParseYoloV3 (
outputLayersInfo, networkInfo, detectionParams, objectList,
Index: objectDetector_Yolo/nvdsinfer_custom_impl_Yolo/trt_utils.cpp
===================================================================
--- objectDetector_Yolo/nvdsinfer_custom_impl_Yolo/trt_utils.cpp (revision 997117c37973bc002080dadd6d605b2ec9a1dfe3)
+++ objectDetector_Yolo/nvdsinfer_custom_impl_Yolo/trt_utils.cpp (revision b58e25ca9603312687a4cc2b1febfa2f4204a89c)
@@ -373,19 +373,19 @@
assert(block.at("type") == "upsample");
nvinfer1::Dims inpDims = input->getDimensions();
assert(inpDims.nbDims == 3);
- assert(inpDims.d[1] == inpDims.d[2]);
+// assert(inpDims.d[1] == inpDims.d[2]);
int h = inpDims.d[1];
int w = inpDims.d[2];
int stride = std::stoi(block.at("stride"));
// add pre multiply matrix as a constant
nvinfer1::Dims preDims{3,
- {1, stride * h, w},
+ {1, stride * h, h},
{nvinfer1::DimensionType::kCHANNEL, nvinfer1::DimensionType::kSPATIAL,
nvinfer1::DimensionType::kSPATIAL}};
- int size = stride * h * w;
+ int size = stride * h * h;
nvinfer1::Weights preMul{nvinfer1::DataType::kFLOAT, nullptr, size};
float* preWt = new float[size];
- /* (2*h * w)
+ /* (2*h * h)
[ [1, 0, ..., 0],
[1, 0, ..., 0],
[0, 1, ..., 0],
@@ -397,12 +397,9 @@
*/
for (int i = 0, idx = 0; i < h; ++i)
{
- for (int s = 0; s < stride; ++s)
+ for (int j = 0; j < h * stride; ++j, ++idx)
{
- for (int j = 0; j < w; ++j, ++idx)
- {
- preWt[idx] = (i == j) ? 1.0 : 0.0;
- }
+ preWt[idx] = (i == j) ? 1.0 : 0.0;
}
}
preMul.values = preWt;
@@ -413,20 +410,20 @@
preM->setName(preLayerName.c_str());
// add post multiply matrix as a constant
nvinfer1::Dims postDims{3,
- {1, h, stride * w},
+ {1, w, stride * w},
{nvinfer1::DimensionType::kCHANNEL, nvinfer1::DimensionType::kSPATIAL,
nvinfer1::DimensionType::kSPATIAL}};
- size = stride * h * w;
+ size = stride * w * w;
nvinfer1::Weights postMul{nvinfer1::DataType::kFLOAT, nullptr, size};
float* postWt = new float[size];
- /* (h * 2*w)
+ /* (w * 2*w)
[ [1, 1, 0, 0, ..., 0, 0],
[0, 0, 1, 1, ..., 0, 0],
...,
...,
[0, 0, 0, 0, ..., 1, 1] ]
*/
- for (int i = 0, idx = 0; i < h; ++i)
+ for (int i = 0, idx = 0; i < w; ++i)
{
for (int j = 0; j < stride * w; ++j, ++idx)
{
Index: objectDetector_Yolo/nvdsinfer_custom_impl_Yolo/yolo.cpp
===================================================================
--- objectDetector_Yolo/nvdsinfer_custom_impl_Yolo/yolo.cpp (revision 997117c37973bc002080dadd6d605b2ec9a1dfe3)
+++ objectDetector_Yolo/nvdsinfer_custom_impl_Yolo/yolo.cpp (revision b58e25ca9603312687a4cc2b1febfa2f4204a89c)
@@ -161,19 +161,21 @@
printLayerInfo(layerIndex, "skip", inputVol, outputVol, " -");
} else if (m_ConfigBlocks.at(i).at("type") == "yolo") {
nvinfer1::Dims prevTensorDims = previous->getDimensions();
- assert(prevTensorDims.d[1] == prevTensorDims.d[2]);
+// assert(prevTensorDims.d[1] == prevTensorDims.d[2]);
TensorInfo& curYoloTensor = m_OutputTensors.at(outputTensorCount);
- curYoloTensor.gridSize = prevTensorDims.d[1];
- curYoloTensor.stride = m_InputW / curYoloTensor.gridSize;
- m_OutputTensors.at(outputTensorCount).volume = curYoloTensor.gridSize
- * curYoloTensor.gridSize
+ curYoloTensor.gridSizeY = prevTensorDims.d[1];
+ curYoloTensor.gridSizeX = prevTensorDims.d[2];
+ curYoloTensor.stride = m_InputH / curYoloTensor.gridSizeY;
+ m_OutputTensors.at(outputTensorCount).volume = curYoloTensor.gridSizeY
+ * curYoloTensor.gridSizeX
* (curYoloTensor.numBBoxes * (5 + curYoloTensor.numClasses));
std::string layerName = "yolo_" + std::to_string(i);
curYoloTensor.blobName = layerName;
nvinfer1::IPluginV2* yoloPlugin
= new YoloLayerV3(m_OutputTensors.at(outputTensorCount).numBBoxes,
m_OutputTensors.at(outputTensorCount).numClasses,
- m_OutputTensors.at(outputTensorCount).gridSize);
+ m_OutputTensors.at(outputTensorCount).gridSizeX,
+ m_OutputTensors.at(outputTensorCount).gridSizeY);
assert(yoloPlugin != nullptr);
nvinfer1::IPluginV2Layer* yolo =
network.addPluginV2(&previous, 1, *yoloPlugin);
@@ -193,10 +195,11 @@
nvinfer1::Dims prevTensorDims = previous->getDimensions();
assert(prevTensorDims.d[1] == prevTensorDims.d[2]);
TensorInfo& curRegionTensor = m_OutputTensors.at(outputTensorCount);
- curRegionTensor.gridSize = prevTensorDims.d[1];
- curRegionTensor.stride = m_InputW / curRegionTensor.gridSize;
- m_OutputTensors.at(outputTensorCount).volume = curRegionTensor.gridSize
- * curRegionTensor.gridSize
+ curRegionTensor.gridSizeY = prevTensorDims.d[1];
+ curRegionTensor.gridSizeX = prevTensorDims.d[2];
+ curRegionTensor.stride = m_InputW / curRegionTensor.gridSizeX;
+ m_OutputTensors.at(outputTensorCount).volume = curRegionTensor.gridSizeX
+ * curRegionTensor.gridSizeY
* (curRegionTensor.numBBoxes * (5 + curRegionTensor.numClasses));
std::string layerName = "region_" + std::to_string(i);
curRegionTensor.blobName = layerName;
@@ -376,7 +379,7 @@
m_InputH = std::stoul(block.at("height"));
m_InputW = std::stoul(block.at("width"));
m_InputC = std::stoul(block.at("channels"));
- assert(m_InputW == m_InputH);
+// assert(m_InputW == m_InputH);
m_InputSize = m_InputC * m_InputH * m_InputW;
}
else if ((block.at("type") == "region") || (block.at("type") == "yolo"))
Index: objectDetector_Yolo/nvdsinfer_custom_impl_Yolo/yolo.h
===================================================================
--- objectDetector_Yolo/nvdsinfer_custom_impl_Yolo/yolo.h (revision 997117c37973bc002080dadd6d605b2ec9a1dfe3)
+++ objectDetector_Yolo/nvdsinfer_custom_impl_Yolo/yolo.h (revision b58e25ca9603312687a4cc2b1febfa2f4204a89c)
@@ -52,7 +52,8 @@
{
std::string blobName;
uint stride{0};
- uint gridSize{0};
+ uint gridSizeY{0};
+ uint gridSizeX{0};
uint numClasses{0};
uint numBBoxes{0};
uint64_t volume{0};
Index: objectDetector_Yolo/nvdsinfer_custom_impl_Yolo/yoloPlugins.cpp
===================================================================
--- objectDetector_Yolo/nvdsinfer_custom_impl_Yolo/yoloPlugins.cpp (revision 997117c37973bc002080dadd6d605b2ec9a1dfe3)
+++ objectDetector_Yolo/nvdsinfer_custom_impl_Yolo/yoloPlugins.cpp (revision b58e25ca9603312687a4cc2b1febfa2f4204a89c)
@@ -45,7 +45,7 @@
// Forward declaration of cuda kernels
cudaError_t cudaYoloLayerV3 (
const void* input, void* output, const uint& batchSize,
- const uint& gridSize, const uint& numOutputClasses,
+ const uint& gridSizeX, const uint& gridSizeY, const uint& numOutputClasses,
const uint& numBBoxes, uint64_t outputSize, cudaStream_t stream);
YoloLayerV3::YoloLayerV3 (const void* data, size_t length)
@@ -53,20 +53,23 @@
const char *d = static_cast<const char*>(data);
read(d, m_NumBoxes);
read(d, m_NumClasses);
- read(d, m_GridSize);
+ read(d, m_GridSizeX);
+ read(d, m_GridSizeY);
read(d, m_OutputSize);
};
YoloLayerV3::YoloLayerV3 (
- const uint& numBoxes, const uint& numClasses, const uint& gridSize) :
+ const uint& numBoxes, const uint& numClasses, const uint& gridSizeX, const uint& gridSizeY) :
m_NumBoxes(numBoxes),
m_NumClasses(numClasses),
- m_GridSize(gridSize)
+ m_GridSizeX(gridSizeX),
+ m_GridSizeY(gridSizeY)
{
assert(m_NumBoxes > 0);
assert(m_NumClasses > 0);
- assert(m_GridSize > 0);
- m_OutputSize = m_GridSize * m_GridSize * (m_NumBoxes * (4 + 1 + m_NumClasses));
+ assert(m_GridSizeX > 0);
+ assert(m_GridSizeY > 0);
+ m_OutputSize = m_GridSizeX * m_GridSizeY * (m_NumBoxes * (4 + 1 + m_NumClasses));
};
nvinfer1::Dims
@@ -100,14 +103,14 @@
cudaStream_t stream)
{
CHECK(cudaYoloLayerV3(
- inputs[0], outputs[0], batchSize, m_GridSize, m_NumClasses, m_NumBoxes,
+ inputs[0], outputs[0], batchSize, m_GridSizeX, m_GridSizeY, m_NumClasses, m_NumBoxes,
m_OutputSize, stream));
return 0;
}
size_t YoloLayerV3::getSerializationSize() const
{
- return sizeof(m_NumBoxes) + sizeof(m_NumClasses) + sizeof(m_GridSize) + sizeof(m_OutputSize);
+ return sizeof(m_NumBoxes) + sizeof(m_NumClasses) + sizeof(m_GridSizeX) + sizeof(m_GridSizeY) + sizeof(m_OutputSize);
}
void YoloLayerV3::serialize(void* buffer) const
@@ -115,13 +118,14 @@
char *d = static_cast<char*>(buffer);
write(d, m_NumBoxes);
write(d, m_NumClasses);
- write(d, m_GridSize);
+ write(d, m_GridSizeX);
+ write(d, m_GridSizeY);
write(d, m_OutputSize);
}
nvinfer1::IPluginV2* YoloLayerV3::clone() const
{
- return new YoloLayerV3 (m_NumBoxes, m_NumClasses, m_GridSize);
+ return new YoloLayerV3 (m_NumBoxes, m_NumClasses, m_GridSizeX, m_GridSizeY);
}
REGISTER_TENSORRT_PLUGIN(YoloLayerV3PluginCreator);
Index: objectDetector_Yolo/nvdsinfer_custom_impl_Yolo/yoloPlugins.h
===================================================================
--- objectDetector_Yolo/nvdsinfer_custom_impl_Yolo/yoloPlugins.h (revision 997117c37973bc002080dadd6d605b2ec9a1dfe3)
+++ objectDetector_Yolo/nvdsinfer_custom_impl_Yolo/yoloPlugins.h (revision b58e25ca9603312687a4cc2b1febfa2f4204a89c)
@@ -51,7 +51,7 @@
{
public:
YoloLayerV3 (const void* data, size_t length);
- YoloLayerV3 (const uint& numBoxes, const uint& numClasses, const uint& gridSize);
+ YoloLayerV3 (const uint& numBoxes, const uint& numClasses, const uint& gridSizeX, const uint& gridSizeY);
const char* getPluginType () const override { return YOLOV3LAYER_PLUGIN_NAME; }
const char* getPluginVersion () const override { return YOLOV3LAYER_PLUGIN_VERSION; }
int getNbOutputs () const override { return 1; }
@@ -89,7 +89,8 @@
private:
uint m_NumBoxes {0};
uint m_NumClasses {0};
- uint m_GridSize {0};
+ uint m_GridSizeX {0};
+ uint m_GridSizeY {0};
uint64_t m_OutputSize {0};
std::string m_Namespace {""};
};
Thank you very much!
any finds on dealing with this?
also, how do anchor indices differ between pjreddie and alexeyabās implementations?
Thank you for all your help steve!
Hi,
Iām using custom trained asymmetric models which actually seem to perform better under TRT than under darknet, so Iāve had no reason to dig into this (as I donāt use the 416x416 COCO models).
As for the anchor boxes, the Deepstream sources hardcode the anchor indices to be {0,1,2}, {3,4,5} or
{1,2,3} {3,4,5}. You need to change these to match whichever indices the model was trained with.
Iāve since made changes to the Deepstream sources to remove any hardcoded anchors and instead retrieve them from the model.cfg file. If Nvidia had a github repo for DS, I would open a PR thereā¦
@eh-steve Iām using models exactly like that so Iād be very surprised if I could actually make them improved better on TRT than on darknet (they were originally developed with AlexeyABās fork of darknet in mind).
Yeah itād be nice if Nvidia open sourced at least customizable parts of deepstream like this but I suppose that for a lot of reasons thatās not gonna happen.
@marcoslucianops recently released his non-square version after I opened an issue on his custom deepstream-yolo implementation (i think he even followed some of your patches guidelines to make it work but as he mentioned above the accuracy wasnāt better but rather worse after applying his custom implementation. Iād suggest (and kindly ask) you take a look and feel free to contribute to his no-square branch here (the repo has been referenced on darknetās readme for a month or two already) : https://github.com/marcoslucianops/DeepStream-Yolo/tree/master/non_square/nvdsinfer_custom_impl_Yolo
I think more people like you could help to support this especially now since nvidia included yolov4 in the recent deepstream5.1 update but still doesnāt support custom dimensionsā¦
Thanks again for all your help and insight on this
DeepStream for YOLO models (v3, v4) with support for non square models and INT8 calibration: