DeepStream primary detector (ResNet10) output parsing

• Hardware Platform (Jetson / GPU) GTX 1080 Ti
• DeepStream Version 4.0.2
• TensorRT Version 5.1.5
• NVIDIA GPU Driver Version (valid for GPU only) 410.104

I use the TensorRT pythonAPI to run inference using the DeepStream primary detector (ResNet-10) model, and I was able to get the output bbox coordinates and the confidence scores for every grid cell. I would like to parse and visualize these boxes.

I try to use and convert (from c++ to python) the output parser code of the DeepStream primary detector (ResNet10) found in /deepstream_sdk_v4.0.2_jetson/sources/libs/nvdsinfer_customparser/nvdsinfer_custombboxparser.cpp, but I don’t understand the logic behind some parts of the code.

My questions regarding to this code:
1. Two normalization parameters are defined (bboxNormX = 35.0 , bboxNormY = 35.0) -> why these numbers are used?

2. Grid cell centers are calculated by the following code, but they actually mark the left and top coordinates + 0.5 of the grid cells. Why do they called gcCenter?
for (int i = 0; i < gridW; i++)
gcCentersX[i] = (float)(i * strideX + 0.5);
gcCentersX[i] /= (float)bboxNormX;


3. What is the four output values of the detector in a grid cell (if we take one class)?
Based on these line, I thought that they are the left,top,right,bottom offsets
rectX1f = (outputX1[w + h * gridW] - gcCentersX[w]) * -bboxNormX;
rectY1f = (outputY1[w + h * gridW] - gcCentersY[h]) * -bboxNormY;
rectX2f = (outputX2[w + h * gridW] + gcCentersX[w]) * bboxNormX;
rectY2f = (outputY2[w + h * gridW] + gcCentersY[h]) * bboxNormY;

But after that they are used as left,top,width,height:
object.left = CLIP(rectX1f, 0, networkInfo.width - 1); = CLIP(rectY1f, 0, networkInfo.height - 1);
object.width = CLIP(rectX2f, 0, networkInfo.width - 1) - object.left + 1;
object.height = CLIP(rectY2f, 0, networkInfo.height - 1) + 1;

Thanks in advance,

1 Like


1. This is the normalization value used in training.

2. 0.5 indicates the position between two center pixel.

3. This is the output format of resnet-10.

Bounding box parser can change from the model you used.
This parser is designed based on how do we train the resnet-10.

You can check some example in */opt/nvidia/deepstream/deepstream/sources/objectDetector_**.
The parser vary from the detector model.

If you are finding the architecture of resnet-10, here is a similar model for your reference:


1 Like


Can I train resnet10 object detection using a custom dataset using digits4 ?

Thank you.

Hi Hodu,

Please help to open a new topic with more details, Thanks

1 Like