Deepstream ssd parser python example coordinate and output

Please provide complete information as applicable to your setup.

• DeepStream Version
DS 6.1.1 nvcr.io/nvidia/deepstream:6.1.1-triton
• Issue Type( questions, new requirements, bugs)
questions

Hi All,

I am following deepstream python ssd example to understand deploying a custom model with triton inference server and parsing the result with python.

Ref: https://github.com/NVIDIA-AI-IOT/deepstream_python_apps/tree/master/apps/deepstream-ssd-parser

From SSD bounding box format, the output for the box.layer buffer should follow
[x1, y1, x2, y2]
where (x1, y1) is te top left corner, and (x2, y2) is the bottom right corner. Is this correct?

But when I follow the code Inside ssd_parser.py, I am confused with the x-y coordinate settings.

rect_x1_f = clip_1d_elm(0)
rect_y1_f = clip_1d_elm(1)
rect_x2_f = clip_1d_elm(2)
rect_y2_f = clip_1d_elm(3)
res.left = rect_y1_f
res.top = rect_x1_f
res.width = rect_y2_f - rect_y1_f
res.height = rect_x2_f - rect_x1_f

where res is NvDsInferObjectDetectionInfo()
https://docs.nvidia.com/metropolis/deepstream/python-api/PYTHON_API/NvDsInfer/NvDsInferObjectDetectionInfo.html

for property res.left and res.top, I wonder why the horizontal offset uses the y-axis output from the model? and vice versa for vertical offset.

Please kindly advise.

Please provide complete information as applicable to your setup.

• Hardware Platform (Jetson / GPU)

• DeepStream Version

• JetPack Version (valid for Jetson only)

• TensorRT Version

• NVIDIA GPU Driver Version (valid for GPU only)

• Issue Type( questions, new requirements, bugs)

• How to reproduce the issue ? (This is for bugs. Including which sample app is using, the configuration files content, the command line used and other details for reproducing)

• Requirement details( This is for new requirement. Including the module name-for which plugin or for which sample application, the function description)

• Hardware Platform (Jetson / GPU)
RTX 3090
• DeepStream Version
DS 6.1.1 nvcr.io/nvidia/deepstream:6.1.1-triton
• JetPack Version (valid for Jetson only)
DS 6.1.1 nvcr.io/nvidia/deepstream:6.1.1-triton
• TensorRT Version
DS 6.1.1 nvcr.io/nvidia/deepstream:6.1.1-triton
• NVIDIA GPU Driver Version (valid for GPU only)
515.86.01
• Issue Type( questions, new requirements, bugs)
questions
• How to reproduce the issue ? (This is for bugs. Including which sample app is using, the configuration files content, the command line used and other details for reproducing)
deepstream_python_apps/apps/deepstream-ssd-parser at master · NVIDIA-AI-IOT/deepstream_python_apps · GitHub
• Requirement details( This is for new requirement. Including the module name-for which plugin or for which sample application, the function description)
sample application deepstream_python_apps/apps/deepstream-ssd-parser at master · NVIDIA-AI-IOT/deepstream_python_apps · GitHub

sorry for the late reply, yes, you are right, please refer to NvDsInferParseCustomTfSSD in deepstream SDK.

Currently, I found the logics of python code is different with C code, please refer to NvDsInferParseCustomTfSSD in deepstream SDK, opt\nvidia\deepstream\deepstream\sources\libs\nvdsinfer_customparser\nvdsinfer_custombboxparser.cpp.
will continue to check.

the code result is right, but the variable names are not right.
here is the original code:
rect_x1_f = clip_1d_elm(0)
rect_y1_f = clip_1d_elm(1)
rect_x2_f = clip_1d_elm(2)
rect_y2_f = clip_1d_elm(3)
res.left = rect_y1_f
res.top = rect_x1_f
res.width = rect_y2_f - rect_y1_f
res.height = rect_x2_f - rect_x1_f

here is the right code:
rect_y1_f = clip_1d_elm(0) #y1
rect_x1_f = clip_1d_elm(1) #x1
rect_y2_f = clip_1d_elm(2) #y2
rect_x2_f = clip_1d_elm(3) #x2
res.left = rect_x1_f
res.top = rect_y1_f
res.width = rect_x2_f - rect_x1_f
res.height = rect_y2_f - rect_y1_f

The clip_1d_elm function use pyds.get_detections(box_layer.buffer, index * 4 + index2) to get the result from the box_layer buffer. So does it mean the SSD bounding box format follows [y1, x1, y2, x2]?

from the corresponding parsing code(C/python), the model’s output format is [y1, x1, y2, x2].
here is the c code, the logic is clear.
enum {y1, x1, y2, x2};
float rectX1f, rectY1f, rectX2f, rectY2f;
rectX1f = ((float*)boxLayer->buffer)[i *4 + 1] * networkInfo.width;
rectY1f = ((float*)boxLayer->buffer)[i *4 + 0] * networkInfo.height;
rectX2f = ((float*)boxLayer->buffer)[i *4 + 3] * networkInfo.width;;
rectY2f = ((float*)boxLayer->buffer)[i *4 + 2] * networkInfo.height;

1 Like

Thanks!

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.