Scaling bounding boxes in second object detection model

I’m trying to run two object detection models (as nvinferserver) where the second detector uses the crop from the first detector as its input - the config for the second model specifies process_mode: PROCESS_MODE_CLIP_OBJECTS and operate_on_gie_id: 1 and the config for the first model specifies unique_id: 1. Both models have parsing/postprocessing in python done in a probe function and the probe for the first model adds the detection metadata to the frame so that the crop is used for the second model.

Where I’m currently stuck is that the postprocessing for the second model requires the original crop size from the first model such that the bounding boxes can be scaled back to the original video size. Right now I’m only seeing 1 output per frame from the second model regardless of the number of detections by the first model. Here’s my second probe function showing how I convert the model outputs to numpy arrays for postprocessing and my debugging efforts so far of figuring out how this metadata aligns.

def sgie_src_pad_buffer_probe(pad,info,u_data):
    gst_buffer = info.get_buffer()
    if not gst_buffer:
        print("Unable to get GstBuffer ")
    batch_meta = pyds.gst_buffer_get_nvds_batch_meta(hash(gst_buffer))
    l_frame = batch_meta.frame_meta_list
    while l_frame is not None:
            frame_meta = pyds.NvDsFrameMeta.cast(
        except StopIteration:
        l_obj = frame_meta.obj_meta_list
        len_obj_meta = 0
        len_obj_user_meta = 0
        while l_obj is not None:
                obj_meta = pyds.NvDsObjectMeta.cast(
            except StopIteration:
            len_obj_meta += 1
            l_user = obj_meta.obj_user_meta_list
            while l_user is not None:
                user_meta = pyds.NvDsUserMeta.cast(
                len_obj_user_meta += 1
                    l_user =
                except StopIteration:
                l_obj =
            except StopIteration:
        l_user = frame_meta.frame_user_meta_list
        len_user_meta = 0
        while l_user is not None:
            len_user_meta += 1
            user_meta = pyds.NvDsUserMeta.cast(
            tensor_meta = pyds.NvDsInferTensorMeta.cast(user_meta.user_meta_data)
            layers_info = []
            for i in range(tensor_meta.num_output_layers):
                layer = pyds.get_nvds_LayerInfo(tensor_meta, i)
            cls_scores = next(layer for layer in layers_info if layer.layerName == 'cls_scores')
            bbox_preds = next(layer for layer in layers_info if layer.layerName == 'bbox_preds')
            ptr = ctypes.cast(pyds.get_ptr(bbox_preds.buffer), ctypes.POINTER(ctypes.c_float))
            bbox_preds = np.ctypeslib.as_array(ptr, shape=BBOX_PRED_SHAPE)
            ptr = ctypes.cast(pyds.get_ptr(cls_scores.buffer), ctypes.POINTER(ctypes.c_float))
            cls_scores = np.ctypeslib.as_array(ptr, shape=CLS_SCORE_SHAPE)
            print(bbox_preds.shape, cls_scores.shape)
                l_user =
            except StopIteration:
        print(len_obj_meta, len_user_meta, len_obj_user_meta)

len_obj_meta seems to track the number of detections of the first model and I can get the crop sizes from obj_meta.rect_params but I’m only getting one output from the second model regardless of how many detections/crops came from the first model - i.e. len_user_meta is always 1. I also tried checking obj_meta.obj_user_meta_list for the model outputs but this is always length-0.

Is it possible the second model is only getting 1 crop as its input or what might be causing this mismatch? Thanks in advance.

Can you describe the specific functions of your two models? If you don’t crop the image yourself, will you detect more than one?

They’re both object detection models and they both can detect more than one object. The second model detects a specific small part of the object detected by the first model.

I’m not sure what you mean by “if you don’t crop the image yourself”. My understanding is that by specifying unique_id: 1 in the first config with process_mode: PROCESS_MODE_CLIP_OBJECTS and operate_on_gie_id: 1 in the second config that Deepstream will handle the cropping based on the object metadata that I add to the frame in the probe function for the first model.

We have a similar demo for the scenario, you can refer to that to setup your project back-to-back-detectors.

I need to do this entirely in python so that example unfortunately doesn’t help in this case.

You can refer to the config file and the logic of the code. You can focus on the following areas SECOND_DETECTOR_IS_SECONDARY and learn how to set the 2nd detector act as a primary(full-frame) detector.

That’s not what I’m asking about. I want the second detector to operate on crops from the first detector, as explained in my initial post, and I believe that part is working.

What I’m asking about is how to scale/translate the bounding boxes of the second detector when doing the postprocessing in python. Since the second detector is operating on crops, I need to know the crop size and location of each input in order to scale/translate it back to the original image dimensions. This is common postprocessing logic for object detectors - the only thing I’m trying to do differently is to implement this in python because I have some other requirements that necessitate that.

For each output of the second model, I need to match it to a crop size from the initial model and currently those don’t line up at all.

In theory, as long as your parameters are configured correctly, these processes require no extra processing by yourself. Our plugin will have an internal scaling process for the input size of the model.

You can scale the bounding boxes yourself in the probe function. But you need to attach that to the gstbuffer, therwise the rest of the processing in the pipeline is still processing the data in the gstbuffer. We do not currently have a demonstration of this type of processing in python.

We still recommend that all post-processing be done through the postprocess plugin.

This is specifically what I’m asking about. How do I do this? Can you please re-read my initial question?

As I attached before, this cannot be implemented in the probe function. If you want to scale the object for the next inference, you can scale the whole image by the nvstreammux at the beginning.

source->nvstreammux(scale by setting the width and height)->...

I’m not asking about image resizing. The question pertains to rescaling the bounding box coordinates. This is a standard part of object detection postprocessing.

I understand what you mean. I’m just pointing out that there’s no point scaling the image yourself. DeepStream will internally scale the image uniformly to the size of the model and rescale to the original image. Too much scales can reduce the accuracy.

Our original image will not change in the gstbuffer for the whole pipeline. If you want to change it, you can only change it in the upstream of the pipeline.

No, you’re still not understanding the question because I’m not talking about scaling an image.

If you run two detectors where the second detector operates on crops from the first detector, you need to know where that crop came from to know where the detection from the second detector actually is in the original image. The output from the second detector is relative to the location of its input. Deepstream doesn’t handle this automatically if you’re writing custom postprocessing.

But I’ve got it working now, so I’ll close this.