Extracting Processed Frames After Inference in Nvidia DeepStream Efficiently

• Hardware Platform (Jetson / GPU) : NVIDIA Jetson AGX Orin
• DeepStream Version : 7.1
• JetPack Version (valid for Jetson only) : 6.1
• TensorRT Version : 8.6.2.3
• Issue Type( questions, new requirements, bugs) : question

Hello,

I have a DeepStream pipeline performing inference using a classifier model. In my nvinfer configuration file, I specified: infer-dims=3;224;224. The input image to the network is of shape (3, 1440, 1440), and I expect nvinfer element to resize it to (3, 224, 224) before inference. The input is 1440x1440 because before the nvinfer plugin there is nvvideoconvert which uses src-crop to crop 4k image to 1440x1440. However, when extracting frames from the nvinfer output in my pad probe function (src), I retrieve frames of shape (1440, 1440, 4) instead of (224, 224, 4).

Here is the function I use to extract frames:

def get_frame(gst_buffer: Gst.Buffer, batch_id: int) -> np.ndarray:
    """
    Get the frame from the gst_buffer
    """
    # Get the NvBufSurface from the gst_buffer (Surface containing image data)
    n_frame = pyds.get_nvds_buf_surface(hash(gst_buffer), batch_id)

    # Convert to numpy array
    frame_image = np.array(n_frame, copy=True, order="C")

    # Free up the memory associated with the buffer
    pyds.unmap_nvds_buf_surface(hash(gst_buffer), batch_id)
    return frame_image

Since the extracted frame retains the original 1440x1440 resolution, my subsequent processing becomes inefficient. For example, my latter operations are adding an alpha channel:

alpha = np.full((img.shape[0], img.shape[1]), 255, dtype=np.uint8)
img = np.dstack((img, alpha))

This operation takes 2-3 ms for an image of 224x224, but 40 ms for 1440x1440, making it a major bottleneck.

Questions:

  1. How can I efficiently extract the resized 224x224 frame that the inference model already operates on?
  2. Is there a way to access the preprocessed image directly instead of the original input size?
  3. Are there any special DeepStream/Cuda functions that will make it efficient to operate on arrays instead of using numpy which I believe uses CPU instead of using GPU?

Does anyone have an idea how this might be solved? I found such implementation of extracting frame in numpy array format,

however this is the same thing that i do. This extracts data in shape that is before the inference: 1440x1440 but not after inference which should be 224x224.

You can only get that from the nvinfer plugin source code. The image from the probe function is always the original dimension.

You can learn how to use cv-cuda to process the image data.

@yuweiw Thank you for your response!

Could you provide an example of how to extract a frame from the nvinfer plugin? Would it be possible to retrieve the frame using a pad probe function attached to the src pad of nvinfer, adding some parameter to nvinfer configuration or I will have to modify the nvinfer plugin?

Additionally, I came across some code utilizing pyds.NvDsObjEncOutParams, such as in this example: deepstream_python_apps/tests/integration/test.py at cb7fd9c8aa012178527e0cb84f91d1f5a0ad37ff · NVIDIA-AI-IOT/deepstream_python_apps · GitHub
However, this implementation relies on frame_user_meta_list.

In my case, my classifier model stores data inside classifier_meta_list. Here’s my relevant code snippet:

class_meta_list = obj_meta.classifier_meta_list
while class_meta_list is not None:
    try:
        classifier_meta = pyds.NvDsClassifierMeta.cast(class_meta_list.data)
    except StopIteration:
        break

    label_info_list = classifier_meta.label_info_list
    while label_info_list is not None:
        try:
            label_info = pyds.NvDsLabelInfo.cast(label_info_list.data)
        except StopIteration:
            break

        fenc_output = pyds.NvDsObjEncOutParams.cast(label_info_list.data)
        foutput = fenc_output.outBuffer()
        assert foutput is not None
        foutput.tofile(f"temp/{frame_num}224.jpg)

When printing fenc_output, it does not appear to be a processed frame. It’s shape is (6,) and it looks like this: [128 209 1 12 255 255] which is not even an original frame.

Is this because NvDsObjEncOutParams should only be used with frame_user_meta_list? If so, what would be the recommended approach to extract the processed frame from nvinfer when using a classifier model?

You can refer to this FAQ to learn how to dump the image in the nvinfer.
We recommend that you first read the source code of the nvinfer briefly, you can dump the image data after any processing step. The diagram of the source code is diagram.

No. It’s binded to this metadata NVDS_CROP_IMAGE_META. You can refer to our C/C++ sample deepstream-image-meta-test.

@yuweiw Thank you for your response.

Could you clarify whether extracting frames requires modifying the nvinfer plugin itself, or if it can be done purely through metadata handling?

I am implementing my DeepStream pipeline in Python and would like to confirm if adding output-tensor-meta=1 to my model config file is necessary. Currently, when I retrieve obj_user_meta_list and process it, the output from:

enc_output = pyds.NvDsObjEncOutParams.cast(user_meta.user_meta_data)

is always None, even though obj_meta_list contains data, and classifier_meta.label_info_list correctly holds classification results.

For a classification model, is setting output-tensor-meta=1 required, or is it unnecessary?

Additionally, would you recommend switching to C++ for handling this, or is it fully achievable in Python?

Hi @yuweiw,
I wanted to check if there are any updates regarding my previous question. Would appreciate any insights you can share. Thanks again!

There are 3 ways to extracte frames.

  1. modifying the nvinfer plugin itself, you can refer to #6
  2. extract the frame from the probe function, you can refer to our sample deepstream_imagedata-multistream.py
  3. Use the metadata, you can refer to our C/C++ sample first deepstream-image-meta-test

The output-tensor-meta will output the tensor data directly after the inference. You need to parse the tensor data yourself if you set that to true.

We recommend that you can try to use C/C++. In this way, you can refer to our deepstream-image-meta-test sample directly and use the hardware encoder to extract the picture, which is more efficient.

Hi @yuweiw,

Thank you for your response.

My goal is to extract and save the image immediately after inference, ideally within nvinfer, as resizing with OpenCV in the pad probe function introduces additional processing overhead.

Current Challenge

I attempted to dump the frame inside nvinfer but faced an issue: I need to save the frame with the exact timestamp when the inference occurred. However, when accessing the frame inside the pad probe function in Python, the extracted timestamp differs from the one recorded in nvinfer.

After reviewing the DeepStream pipeline diagram, I found that gst_nvinfer_output_loop seems to be the best place to save the frame after inference. However, the example provided in the forum saves the image before inference, which is not what I need.

Additional Post-Processing in the Pad Probe Function

Currently, in my pad probe function, I perform the following steps:

  1. Extract metadata (frame and object).

  2. Retrieve classifier results.

  3. Resize the frame using OpenCV (cv2.resize).

  4. Save the resized frame with an NTP timestamp.

  5. Store metadata in Redis.

Here’s the relevant code snippet:

while l_frame is not None:
        try:
            frame_meta = pyds.NvDsFrameMeta.cast(l_frame.data)
        except StopIteration:
            break

        l_obj = frame_meta.obj_meta_list
        while l_obj is not None:
            try:
                obj_meta = pyds.NvDsObjectMeta.cast(l_obj.data)
            except StopIteration:
                break

            class_meta_list = obj_meta.classifier_meta_list
            while class_meta_list is not None:
                try:
                    classifier_meta = pyds.NvDsClassifierMeta.cast(class_meta_list.data)
                except StopIteration:
                    break

                label_info_list = classifier_meta.label_info_list
                while label_info_list is not None:
                    try:
                        label_info = pyds.NvDsLabelInfo.cast(label_info_list.data)
                    except StopIteration:
                        break

                    probe = label_info.result_prob
                    ntp_timestamp = int(frame_meta.ntp_timestamp / 1_000_000)
                    frame = get_frame(gst_buffer, frame_meta.batch_id)
                    frame_resized = cv2.resize(frame, (224, 224))
                    output_file_path = f"output/{ntp_timestamp}-224.rgb"
                    frame_resized.tofile(output_file_path)

                    redis_data = {
                        "timestamp": ntp_timestamp,
                        "probability": arcing_prob,
                        "files": {"res224": output_file_path},
                    }

                    redis_client.zadd(
                        "detection", {json.dumps(redis_data): ntp_timestamp}
                    )
                    try:
                        label_info_list = label_info_list.next
                    except StopIteration:
                        break
                try:
                    class_meta_list = class_meta_list.next
                except StopIteration:
                    break
            try:
                l_obj = l_obj.next
            except StopIteration:
                break
        frame_meta.bInferDone = True
        try:
            l_frame = l_frame.next
        except StopIteration:
            break

Question

How can I extract a resized frame within nvinfer so that I don’t have to perform additional resizing in the pad probe function? Since I need to retrieve the resized frame inside the pad probe function connected to nvinfer, is there a way to store and access it directly?

Any guidance on optimizing this process would be greatly appreciated!

You can only get the frame of the original dimension from the probe function.
There’s no frame data in the output tensor after the inference. So if you want to get the frame with the dimension of the model, you can only get that before the inference within the nvinfer or get the original frame in the probe function and resize it.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.