Python vs C++ Parser Conversion Issues in DeepStream with YOLOv8

• Hardware Platform (Jetson / GPU): dPU A40.
• DeepStream Version: 6.4-triton-multiarch.
• TensorRT Version: 8.6.2.3.
• NVIDIA GPU Driver Version (valid for GPU only): 525.147.05.
• Issue Type( questions, new requirements, bugs): questions.
• How to reproduce the issue ? (This is for bugs. Including which sample app is using, the configuration files content, the command line used and other details for reproducing)

I am encountering an issue using NVIDIA DeepStream with the YOLOv8 model. I have written a parser in C++ which works quite well. The input to the YOLOv8 model is 640x640.

When I configure streammux with width and height as either 640x640 or using the original image size of 1920x1080, my C++ parser correctly draws the bounding boxes. However, when I switch to writing the parser in Python (using the output-tensor-meta=1 setting), I can only draw accurate bounding boxes when streammux is configured with width and height as 640x640.

I have several questions regarding this issue:

  1. Conversion of Bounding Boxes in C++ vs Python: I suspect there is a conversion process of bounding boxes from the output of the C++ parser to the size of streammux. Is there any function in DeepStream to perform this with the Python parser?
  2. Preprocessing for SGIE: Does DeepStream convert bounding box coordinates to the streammux resolution before cropping the image for SGIE processing? In other words, is the cropping for SGIE based on bounding box coordinates that have already been adjusted to match the streammux resolution?
  3. Using NMS in Python: In the C++ configuration, I can use NMS through [class-attrs-all]. In Python, do I need to rewrite the NMS logic, or is there a way to apply similar configuration from the C++ config file?

Any help and guidance would be greatly appreciated. Thank you!

You mean the preprocess parser or postprocess parser? Could you attach your whole pipeline?

1.There is a conversion in the nvstreammux. It will convert the original video to the width and height you configured.
2. There is a conversion in the nvinfer to convert the width and height to the model.
3. No. You can still config the NMS with the [class-attrs-all] or you need to implement by python all by yourself.

I am talking about a postprocess parser for the model. My current pipeline is uridecodebin → streammux(1920x1080) → pgie(640x640) → nvvideoconvert → nvdsosd → nvvideoconvert → capsfilter → nvv4l2h264enc → h264parse → filesink. It seems you misunderstood my point. In my first question, I wrote a postprocess parser for the model in C++, which works correctly for both streammux sizes 1920x1080 and 640x640. However, when I use output-tensor-meta=1 to write the parser in Python from pgie, the result incorrectly draws bounding boxes if the streammux size is 1920x1080. This leads me to suspect that if I use parse-bbox-func-name, there is a step to convert the bounding coordinates from the model’s size of 640x640 back to 1920x1080 to draw them correctly, which I don’t have in Python. My second question is, if I add sgie, will the input crop the image resized to 640x640 from pgie or crop the image from 1920x1080 of streammux? In my third question, are you suggesting that I can still use nms from [class-attrs-all] for the postprocess parser in Python? I am using DeepStream Python.

1.Yes. If you implement the postprocess yourself with python, you need to process the conversion of the coordinate.
2. The input video size depends on the output of the previous plugin.
3. If you want to implement your own algorithm in python, you need to implement yourself.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.