Missing bounding boxes when using 'maintain-aspect-ratio=1' in nvinfer

Please provide complete information as applicable to your setup.

• DeepStream Version
v6.1.1
• TensorRT Version
8.4.3*
• Issue Type( questions, new requirements, bugs)
Bug
**• How to reproduce the issue ? **

When using gst-nvinfer with the ‘maintain-aspect-ratio=1’ option, there are situations in which valid bounding boxes will be omitted by gst-nvinfer.
Example:
Input video size is 1920x1080, and network dimensions are 640x640. nvinfer will resize the image to 640 x 360 to maintain the aspect ratio. The parse-bbox-func does not get the image dimensions, but only the network dimensions, so a parse-bbox-func can only clamp the box to network dimension, not image dimensions. Hence parse-bbox-func may return boxes which are not completely contained in the image. These boxes are discarded by nvinfer (gstnvinfer_meta_utils.cpp, lines 76-80, v6.1.1).

This is a big problem when the app needs to be able to handle varying input aspect-ratios (so the image dimensions cannot be hard coded in parse-bbox-func).

about “nvinfer will resize the image to 640 x 360 to maintain the aspect ratio.”, please refer to refer to get_converted_buffer() of nvinfer plugin, nvinfer will do padding, the final resolution will be 640x640, that is, nvinfer will convert input resolution to model’s input dimension.

Yes, my formulation was imprecise. nvinfer scales the image to a 640x360 ROI inside the 640x640 network input, when maintain-aspect-ratio=1. The problem is that the parse-bbox-func’s are not aware of this ROI. This leads to the problem I describe: The network might produce bounding boxes which exceed the 640x360 ROI, and the parse-bbox-func cannot clamp to the image ROI.

  1. about “the parse-bbox-func cannot clamp to the image ROI.” , could you share the code line?
  2. how to reproduce this issue?
  1. This is the prototype of the parse-bbox-func (from nvdsinfer_custom_impl.h)
typedef bool (* NvDsInferParseCustomFunc) (
        std::vector<NvDsInferLayerInfo> const &outputLayersInfo,
        NvDsInferNetworkInfo  const &networkInfo,
        NvDsInferParseDetectionParams const &detectionParams,
        std::vector<NvDsInferObjectDetectionInfo> &objectList);

A function implementing this signature only has information about the network dimensions (via networkInfo), but no information about the ROI of the padded image in the network input.

Reproduction:

  1. download attached video (still video of a slightly cropped car from wikipedia)
  2. in sources/objectDetector_Yolo run deepstream-app -c deepstream_app_config_yoloV3.txt on the video
  3. observe that the deepstream-app does not draw a bounding box around the slightly cropped car in the foreground.

To remedy this, one can either set maintain-aspect-ratio=0 in config_infer_primary_yoloV3.txt, or
comment out the lines mentioned in my first post (sources/gst-plugins/gst-nvinfer/gstnvinfer_meta_utils.cpp, 76-80) and recompile the plugin.

  1. please check if the input data need to maintain aspect-ratio, it depends on your model.
  2. please refer to this patch.
    if (obj.top + obj.height >
    (frame.input_surf_params->height - filter_params.roiBottomOffset))
    {
    obj.height = frame.input_surf_params->height - filter_params.roiBottomOffset - obj.top;
    // continue;
    }

Thanks! The patch works for this particular case, but it is incomplete I believe.
Open case:

  1. Boxes extend to the right (eg. net dimension is 640x640, but image ROI is 360x640)
  2. It is possible that the ROI is padded symmetrically

Thanks for your sharing, nvinfer plugin is opensource, you can modify as you needed, we will try to do optimization .