Please provide complete information as applicable to your setup.
• DeepStream Version
v6.1.1 • TensorRT Version
8.4.3* • Issue Type( questions, new requirements, bugs)
Bug
**• How to reproduce the issue ? **
When using gst-nvinfer with the ‘maintain-aspect-ratio=1’ option, there are situations in which valid bounding boxes will be omitted by gst-nvinfer.
Example:
Input video size is 1920x1080, and network dimensions are 640x640. nvinfer will resize the image to 640 x 360 to maintain the aspect ratio. The parse-bbox-func does not get the image dimensions, but only the network dimensions, so a parse-bbox-func can only clamp the box to network dimension, not image dimensions. Hence parse-bbox-func may return boxes which are not completely contained in the image. These boxes are discarded by nvinfer (gstnvinfer_meta_utils.cpp, lines 76-80, v6.1.1).
This is a big problem when the app needs to be able to handle varying input aspect-ratios (so the image dimensions cannot be hard coded in parse-bbox-func).
about “nvinfer will resize the image to 640 x 360 to maintain the aspect ratio.”, please refer to refer to get_converted_buffer() of nvinfer plugin, nvinfer will do padding, the final resolution will be 640x640, that is, nvinfer will convert input resolution to model’s input dimension.
Yes, my formulation was imprecise. nvinfer scales the image to a 640x360 ROI inside the 640x640 network input, when maintain-aspect-ratio=1. The problem is that the parse-bbox-func’s are not aware of this ROI. This leads to the problem I describe: The network might produce bounding boxes which exceed the 640x360 ROI, and the parse-bbox-func cannot clamp to the image ROI.
A function implementing this signature only has information about the network dimensions (via networkInfo), but no information about the ROI of the padded image in the network input.
Reproduction:
download attached video (still video of a slightly cropped car from wikipedia)
in sources/objectDetector_Yolo run deepstream-app -c deepstream_app_config_yoloV3.txt on the video
observe that the deepstream-app does not draw a bounding box around the slightly cropped car in the foreground.
To remedy this, one can either set maintain-aspect-ratio=0 in config_infer_primary_yoloV3.txt, or
comment out the lines mentioned in my first post (sources/gst-plugins/gst-nvinfer/gstnvinfer_meta_utils.cpp, 76-80) and recompile the plugin.