Missing bounding boxes when using 'maintain-aspect-ratio=1' in nvinfer

tojerop853 · September 9, 2022, 8:32am

Please provide complete information as applicable to your setup.

• DeepStream Version
v6.1.1
• TensorRT Version
8.4.3*
• Issue Type( questions, new requirements, bugs)
Bug
**• How to reproduce the issue ? **

When using gst-nvinfer with the ‘maintain-aspect-ratio=1’ option, there are situations in which valid bounding boxes will be omitted by gst-nvinfer.
Example:
Input video size is 1920x1080, and network dimensions are 640x640. nvinfer will resize the image to 640 x 360 to maintain the aspect ratio. The parse-bbox-func does not get the image dimensions, but only the network dimensions, so a parse-bbox-func can only clamp the box to network dimension, not image dimensions. Hence parse-bbox-func may return boxes which are not completely contained in the image. These boxes are discarded by nvinfer (gstnvinfer_meta_utils.cpp, lines 76-80, v6.1.1).

This is a big problem when the app needs to be able to handle varying input aspect-ratios (so the image dimensions cannot be hard coded in parse-bbox-func).

fanzh · September 13, 2022, 5:30am

about “nvinfer will resize the image to 640 x 360 to maintain the aspect ratio.”, please refer to refer to get_converted_buffer() of nvinfer plugin, nvinfer will do padding, the final resolution will be 640x640, that is, nvinfer will convert input resolution to model’s input dimension.

tojerop853 · September 13, 2022, 7:11am

Yes, my formulation was imprecise. nvinfer scales the image to a 640x360 ROI inside the 640x640 network input, when maintain-aspect-ratio=1. The problem is that the parse-bbox-func’s are not aware of this ROI. This leads to the problem I describe: The network might produce bounding boxes which exceed the 640x360 ROI, and the parse-bbox-func cannot clamp to the image ROI.

fanzh · September 13, 2022, 7:27am

about “the parse-bbox-func cannot clamp to the image ROI.” , could you share the code line?
how to reproduce this issue?

tojerop853 · September 13, 2022, 8:41am

This is the prototype of the parse-bbox-func (from nvdsinfer_custom_impl.h)

typedef bool (* NvDsInferParseCustomFunc) (
        std::vector<NvDsInferLayerInfo> const &outputLayersInfo,
        NvDsInferNetworkInfo  const &networkInfo,
        NvDsInferParseDetectionParams const &detectionParams,
        std::vector<NvDsInferObjectDetectionInfo> &objectList);

A function implementing this signature only has information about the network dimensions (via networkInfo), but no information about the ROI of the padded image in the network input.

Reproduction:

download attached video (still video of a slightly cropped car from wikipedia)
in sources/objectDetector_Yolo run deepstream-app -c deepstream_app_config_yoloV3.txt on the video
observe that the deepstream-app does not draw a bounding box around the slightly cropped car in the foreground.

To remedy this, one can either set maintain-aspect-ratio=0 in config_infer_primary_yoloV3.txt, or
comment out the lines mentioned in my first post (sources/gst-plugins/gst-nvinfer/gstnvinfer_meta_utils.cpp, 76-80) and recompile the plugin.

fanzh · September 19, 2022, 8:22am

please check if the input data need to maintain aspect-ratio, it depends on your model.
please refer to this patch.
if (obj.top + obj.height >
(frame.input_surf_params->height - filter_params.roiBottomOffset))
{
obj.height = frame.input_surf_params->height - filter_params.roiBottomOffset - obj.top;
// continue;
}

tojerop853 · September 19, 2022, 2:26pm

Thanks! The patch works for this particular case, but it is incomplete I believe.
Open case:

Boxes extend to the right (eg. net dimension is 640x640, but image ROI is 360x640)
It is possible that the ROI is padded symmetrically

fanzh · September 23, 2022, 8:45am

Thanks for your sharing, nvinfer plugin is opensource, you can modify as you needed, we will try to do optimization .

system · October 7, 2022, 10:45am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.