Cropping at Original Frame Resolution Using Gst-nvmultiurisrcbin

• Hardware Platform: GPU
• DeepStream Version: 7.1
• TensorRT Version: 10.9.0.34
• NVIDIA GPU Driver Version (valid for GPU only): 550.144.03

Hi everyone, I hope you’re doing well!

I’m currently working on a DeepStream pipeline using the Python bindings, and I’d like to know whether it’s possible to perform object cropping at the original resolution when using the Gst-nvmultiurisrcbin element.

The general idea is to use a queue and a tee before the GstNvStreamMux, so we can preserve the original decoded frame. After inference and tracking are completed, we would then crop the detected object at the original resolution, possibly to perform further analysis or inference.

At the moment, our cropping happens at the resolution scaled down by nvstreammux (e.g., 720p), using a probe function at the end of the pipeline. This leads to suboptimal crops, especially for high-res streams like 4K.

We’re considering a few approaches:

  1. Manually building an equivalent to nvmultiurisrcbin in our pipeline, keeping its reconnection logic and support for both H.264 and H.265, but also exposing a second output pad with the original-resolution frames.
  2. Injecting custom elements into nvmultiurisrcbin, though this seems unlikely since it’s a compiled, closed-source bin.
  3. Somehow pushing nvinfer and nvtracker metadata upstream, to access it before the nvstreammux scaling — possibly even trickier.
  4. Running two independent pipelines, one for inference and tracking (720p), and another to receive original frames and perform cropping based on shared metadata.

We’re also unsure how to synchronize the metadata from the inference branch with the original frames from the crop branch, if we do manage to duplicate the frame before scaling. That might be a topic on its own — a whole can of worms.

Has anyone attempted something similar, or is there an official or recommended approach for this use case?

Thanks in advance for any insights — and keep up the great work!

pipeline.pdf (41.1 KB)

Would there be any issues if the width and height of nvstreammux were set to the same values as the resolution of your original video in your scenario?

Other than performance issues, no.

Our goal is to run around 200 streams of different resolutions.

For streams already in 720p this is not a problem, but in resolutions of 1080p, 2K or 4K, the crops lose details that would be interesting to keep for future inferences.

We decoupled the inference to maintain the performance of real-time video processing.

We have already considered increasing the inference resolution to try to maintain some details, but since we use Gst-nvvideoconvert to convert NV12 to RGB for cropping, we have a very large performance penalty.
In this scenario, we also considered cropping directly to NV12, so as not to lose performance by converting it to RGB, but we were unable to do so.

First of all, you need to ensure that your device is capable of decoding 200 video streams simultaneously.

Currently, the only way to achieve your function is through the “tee” plugin you mentioned. You can count the number of video frames by yourself in the tee branch and match it with the number of video frames after the reasoning process.

We’re currently running a DeepStream pipeline on an A100-SXM4-40GB and are able to decode around 200 streams[1], reaching a total of ~3000 FPS — even when using nvvideoconvert (which caused some performance drop). The setup is very efficient so far.

We’re now exploring a way to crop detections at the original resolution, which leads us to the idea of duplicating the frame before it gets scaled down by GstNvStreamMux.

Since nvmultiurisrcbin is a binary/closed element, is there any way to traverse or intercept its internal elements and insert a queue or a tee before the frames hit the GstNvStreamMux?

If not, then I suppose I would need to mimic the nvmultiurisrcbin behavior manually, right?
What concerns me most is how to replicate the robust stream reconnection mechanisms that nvmultiurisrcbin provides — it’s been extremely stable in our long-running production tests with RTSP sources.

I’m also assuming that such a custom implementation would need to be done in C++, because from what I understand, pipeline control (play/pause, reset, etc.) is less reliable in Python, especially when dealing with reconnection, dynamic pads, or error recovery.

Also, while digging through GStreamer plugins, I came across two interesting ones:

originalbuffersave: Saves a reference to the buffer in a meta

originalbufferrestore: Restores the original buffer from meta

Would these plugins help in retaining a copy of the original frame before muxing?

Any recommendations, insights, or best practices from those who’ve built similar setups would be very welcome!


  1. Forgot to mention that we separate in multiple processes of 32 streams. ↩︎

Actually, exploring the deepstream folder, I’ve found the source code in deepstream/sources/gst-plugins/gst-nvmultiurisrcbin. Is it better to tee multiple GstDsNvUriSrcBin elements , or link them to a Queue before the tee?

It’s open source. Please refer to the source code below.

deepstream\sources\libs\gstnvdscustomhelper\gst-nvmultiurisrcbincreator.cpp
deepstream\sources\gst-plugins\gst-nvmultiurisrcbin
deepstreap\sources\gst-plugins\gst-nvurisrcbin

The pipeline you need to implement in the nvmultiurisrcbin is roughly as follows.

                ->original image branch(count the number of video frames by yourself )
nvurisrcbin->tee
               ->inference pipeline branch(match the frame with the number of original video frames)

If we succeed in using tee and saving the frames for later cropping, we will still have to do Gst-nvvideoconvert, right?

Is there somewhere I can read if there will be a loss in performance if the images are not batched with the same size?

We know that conversion is necessary to perform cropping because the format must be RGBA for the get_nvds_buf_surface function to work[1].

So far in our research we have not found a way to crop directly on NV12 unfortunately.


  1. ↩︎

For Python, yes. You can also try to use the OpenCV to crop the image from the probe function like deepstream_imagedata-multistream.py.

You can simply read it by the fps of the pipeline.

For Python, yes. We only support RGBA/RGB color Format in Python.

That’s great, thanks for the clarification @yuweiw!

However, is there a path to avoid the Gst-nvvideoconvert, even if it’s out of the python?

A more straightforward way in C++ to crop bounding boxes from the gpu without the need for conversion first?

You can refer to our sample sources\apps\sample_apps\deepstream-image-meta-test\deepstream_image_meta_test.c. We will directly use the following API to save the images with hardware encoding.

static GstPadProbeReturn
pgie_src_pad_buffer_probe (GstPad * pad, GstPadProbeInfo * info, gpointer ctx)
{
...
      /* Conditions that user needs to set to encode the detected objects of
       * interest. Here, by default all the detected objects are encoded.
       * For demonstration, we will encode the first object in the frame. */
      if ((obj_meta->class_id == PGIE_CLASS_ID_PERSON
              || obj_meta->class_id == PGIE_CLASS_ID_VEHICLE)
          && num_rects == 1) {
        NvDsObjEncUsrArgs objData = { 0 };
        /* To be set by user */
        objData.saveImg = save_img;
        objData.attachUsrMeta = attach_user_meta;
        /* Set if Image scaling Required */
        objData.scaleImg = FALSE;
        objData.scaledWidth = 0;
        objData.scaledHeight = 0;
        /* Preset */
        objData.objNum = num_rects;
        /* Quality */
        objData.quality = 80;
        /* Set to calculate time taken to encode JPG image. */
        if (calc_enc) {
          objData.calcEncodeTime = 1;
        }
        /*Main Function Call */
        nvds_obj_enc_process (ctx, &objData, ip_surf, obj_meta, frame_meta);
      }
...
}

Most of our Pipeline logic is written in Python, is it feasible to turn our probe function to a custom plugin and use the nvds_obj_enc_process like in the sample app to crop directly in nv12?

No. If you are using Python, please refer to our deepstream_imagedata-multistream.py to crop that with OpenCV.

Unfortunately, this example also uses “get_nvds_buf_surface” :c

So, in Python, there’s no way to bypass the need for RGB transformation, right?

Yes. Only RGBA format is supported in the get_nvds_buf_surface API.

Thanks for the clarification!

We’re currently implementing tee in nvmultiurisrcbin, and if we encounter any issues, I’ll post them here.

In the meantime, we modified an open-source plugin for the first time and compiled it into a .so file to meet our needs, and the results were positive.

If it works and anyone is interested, I’ll post the patch file here.

Thanks. You can post your patch here and make a simple explanation to facilitate everyone’s reference.

There is no update from you for a period, assuming this is not an issue anymore. Hence we are closing this topic. If need further support, please open a new one. Thanks

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.