Passing Transformed Image Frames down the Pipeline

Hello,

I was wondering if achieving the following results is possible via a conventional DeepStream pipeline.

We want to add a watermark to the processed video stream prior to inference. In other words, if the original data frame is X and the image frame with the added watermark is X’, we would like to pass X’ down the pipeline for inference.

Suppose we would like to perform object detection using X’ and we have a file-sink, we want DeepStream to output a video stream with the bounding boxes drawn onto X’ instead of the original image frame X.

Would this be possible using DeepStream? If so, could I get an explanation detailing how to achieve this?

Although this is a general question, I will add some basic development environment-related information down below just in case.

• Hardware Platform - dGPU server - DeepStream docker container: nvcr.io/nvidia/deepstream:5.1-21.02-devel
• DeepStream Version: 5.1
• TensorRT Version: 7.2.2.1

Thank you and best regards,
Jay

What kind of watermark do you need?

Thank you for the response. This could be a company logo.
But the main idea would be to transform the original image frame passed down the pipeline so that the SGIE receives the transformed image (image with watermark).

If the PGIE is an object detector and the sink element is a filesink, then the output video file should be the watermarked video with added bounding boxes for objects detected.

We don’t provide any mixer or blender for videos in deepstream SDK. Please refer to the open source gstreamer plugins for the related plugins. GStreamer

Thank you for the response. I will take a look at the gstreamer plugin to see whether such a plugin for updating image frames down the pipeline is available.

In that case, I was wondering: would something like the following be possible?

Suppose that we have an Autoencoder A which takes in an image X and applies a transformation function A(X) to create X’ which is a tensor with the same dimensions as X, the original input frame.

Would the following scenario be possible? If it is the same scenario as the one I described in my previous question, please let me know.

The main idea is that, after PGIE inference, the goal is to update X with X’ so that the filesink outputs X’ instead of X. The PGIE will perform inference on X and the subsequent SGIEs will receive X’ instead of X as input.

Thank you in advance!

Deepstream only supports video/image/audio inferencing now. The general tensor data inferencing is not supported.

We already have some conversion plugins with HW acceleration in deepstream. So the answer to your question depends on what kind of transformation is “A(X)”. Please specify your requirements.

Deepstream only supports video/image/audio inferencing now. The general tensor data inferencing is not supported.

We already have some conversion plugins with HW acceleration in deepstream. So the answer to your question depends on what kind of transformation is “A(X)”. Please specify your requirements.

Let’s say A is a model, such as an Autoencoder. We build the model A and serialize it into TensorRT engine format (.engine) file.

X here is an image frame from a video. In other words, A is a model that performs video inferencing.
In other words, we can think of X as a 4 dimensional tensor with dimensions (B, C, H, W), where B = size of mini-batch, C = number of channels, which is 3 for RGB images, H = height of the image and W = width of the image frame.

The primary goals are as follows

Given: An image frame X.

Do the following:

  1. Inference on X using Autoencoder A, producing A(X) = X’. Here, X’ is the output from the autoencoder, which has dimensions (B, C, H, W), same as the original input frame
  2. Replace or embed X’ into the pipeline so that an object detection model (this would be the SGIE), let’s say O infers on X’ instead of X’ to output top k bounding boxes and class confidence scores.
  3. Output videos with bounding boxes drawn on X’ instead of X in the file sink.

Thank you for taking time to read through this.

Current nvinfer plugin Gst-nvinfer — DeepStream 6.1.1 Release documentation can handle such case with “output-tensor-meta=1” and “network-type=100” settings.

The only thing you need to do is to overwrite the NvBufSurface in the nvinfer src pad probe function. There is sample of replace the NvBufSurface in Deepstream sample code snippet - Intelligent Video Analytics / DeepStream SDK - NVIDIA Developer Forums

Please make sure you are familiar with gstreamer basic knowledge and coding skills before you start with deepstream. https://gstreamer.freedesktop.org/

Current nvinfer plugin Gst-nvinfer — DeepStream 5.1 Release documentation can handle such case with “output-tensor-meta=1” and “network-type=100” settings.

Thank you for the answer. I have query on “network-type=100”. From the documentation, it specifies the following values.

0: Detector
1: Classifier
2: Segmentation
3: Instance Segmentation

Can network-type=100 be interpreted as classifier, detector, detector? (3 models, with a blank space delimiter ‘’). Or does the value 100 carry some other meaning?

The only thing you need to do is to overwrite the NvBufSurface in the nvinfer src pad probe function

Thank you for the answer. I am guessing that we can overwrite the buffer directly using the GStreamer Python bindings in the nvinfer src pad probe function. Please correct me if I am wrong.

Assuming that I am on the right track, I was wondering if overwriting the image directly inside of the probe function is the right approach and if so, is there a way to ensure that the overwritten data persists down the pipeline.

I am guessing that the code will be structured similarly if this can be achieved using Python GStreamer bindings. For example:

pgie = Gst.ElementFactory.make("nvinfer", "primary-inference")
... 
# Get src pad
pgie_src_pad = pgie.get_static_pad("src")

# add probe function
pgie_src_pad.add_probe(Gst.PadProbeType.BUFFER, pgie_src_pad_buffer_probe, 0)

def pgie_src_pad_buffer_probe(pad,info,u_data):
    gst_buffer = info.get_buffer()
    batch_meta = pyds.gst_buffer_get_nvds_batch_meta(hash(gst_buffer))

    l_frame = batch_meta.frame_meta_list
    while l_frame is not None:
        try:
            frame_meta = pyds.NvDsFrameMeta.cast(l_frame.data)
        except StopIteration:
            break
        
        l_obj = frame_meta.obj_meta_list

        while l_obj is not None:
            try:
                obj_meta = pyds.NvDsObjectMeta.cast(l_obj.data)
            except StopIteration:
                break

            # Retrieve NvBufSurface containing image frame data
            nvds_buf_surface = pyds.get_nvds_buf_surface(hash(gst_buffer), frame_meta.batch_id)

            # How would we overwrite the original image frames with inference output from PGIE?
            # e.g. autoencoder_output =  get_autoencoder_output()
            # replace contents in nvds_buf_surface with autoencoder_output and persist it down the pipeline.

            try:
                l_obj = l_obj.next
            except StopIteration:
                break
    
        try:
            l_frame = l_frame.next
        except StopIteration:
            break

    return Gst.PadProbeReturn.OK

Thank you very much for your time and patience once again.

nvinfer is open source, you can find “NvDsInferNetworkType_Other” is 100 in the header file /opt/nvidia/deepstream/deepstream/sources/includes/nvdsinfer_context.h.
and https://docs.nvidia.com/metropolis/deepstream/sdk-api/Infer/Infer.html

No. Since python memory management is different to c/c++. You can not do the same thing in python. Please use c/c++ for your special case.

1 Like

nvinfer is open source, you can find “NvDsInferNetworkType_Other” is 100 in the header file /opt/nvidia/deepstream/deepstream/sources/includes/nvdsinfer_context.h.
and NvDsInfer API — Deepstream Deepstream Version: 5.1 documentation

Thank you for the clarification.

No. Since python memory management is different to c/c++. You can not do the same thing in python. Please use c/c++ for your special case.

Okay, that makes perfect sense. Thank you very much for the prompt and valuable responses.