• Hardware Platform (Jetson / GPU) GTX 1080 Ti
• DeepStream Version 4.0.2
• TensorRT Version 5.1.5
• NVIDIA GPU Driver Version (valid for GPU only) 410.104
I use the TensorRT pythonAPI to run inference using the DeepStream primary detector (ResNet-10) model, and I was able to get the output bbox coordinates and the confidence scores for every grid cell. I would like to parse and visualize these boxes.
I try to use and convert (from c++ to python) the output parser code of the DeepStream primary detector (ResNet10) found in /deepstream_sdk_v4.0.2_jetson/sources/libs/nvdsinfer_customparser/nvdsinfer_custombboxparser.cpp, but I don’t understand the logic behind some parts of the code.
My questions regarding to this code:
1. Two normalization parameters are defined (bboxNormX = 35.0 , bboxNormY = 35.0) -> why these numbers are used?
2. Grid cell centers are calculated by the following code, but they actually mark the left and top coordinates + 0.5 of the grid cells. Why do they called gcCenter?
for (int i = 0; i < gridW; i++)
gcCentersX[i] = (float)(i * strideX + 0.5);
gcCentersX[i] /= (float)bboxNormX;
3. What is the four output values of the detector in a grid cell (if we take one class)?
Based on these line, I thought that they are the left,top,right,bottom offsets
rectX1f = (outputX1[w + h * gridW] - gcCentersX[w]) * -bboxNormX;
rectY1f = (outputY1[w + h * gridW] - gcCentersY[h]) * -bboxNormY;
rectX2f = (outputX2[w + h * gridW] + gcCentersX[w]) * bboxNormX;
rectY2f = (outputY2[w + h * gridW] + gcCentersY[h]) * bboxNormY;
But after that they are used as left,top,width,height:
object.left = CLIP(rectX1f, 0, networkInfo.width - 1);
object.top = CLIP(rectY1f, 0, networkInfo.height - 1);
object.width = CLIP(rectX2f, 0, networkInfo.width - 1) - object.left + 1;
object.height = CLIP(rectY2f, 0, networkInfo.height - 1) -object.top + 1;
Thanks in advance,