Extracting Processed Frames After Inference in Nvidia DeepStream Efficiently

• Hardware Platform (Jetson / GPU) : NVIDIA Jetson AGX Orin
• DeepStream Version : 7.1
• JetPack Version (valid for Jetson only) : 6.1
• TensorRT Version : 8.6.2.3
• Issue Type( questions, new requirements, bugs) : question

Hello,

I have a DeepStream pipeline performing inference using a classifier model. In my nvinfer configuration file, I specified: infer-dims=3;224;224. The input image to the network is of shape (3, 1440, 1440), and I expect nvinfer element to resize it to (3, 224, 224) before inference. The input is 1440x1440 because before the nvinfer plugin there is nvvideoconvert which uses src-crop to crop 4k image to 1440x1440. However, when extracting frames from the nvinfer output in my pad probe function (src), I retrieve frames of shape (1440, 1440, 4) instead of (224, 224, 4).

Here is the function I use to extract frames:

def get_frame(gst_buffer: Gst.Buffer, batch_id: int) -> np.ndarray:
    """
    Get the frame from the gst_buffer
    """
    # Get the NvBufSurface from the gst_buffer (Surface containing image data)
    n_frame = pyds.get_nvds_buf_surface(hash(gst_buffer), batch_id)

    # Convert to numpy array
    frame_image = np.array(n_frame, copy=True, order="C")

    # Free up the memory associated with the buffer
    pyds.unmap_nvds_buf_surface(hash(gst_buffer), batch_id)
    return frame_image

Since the extracted frame retains the original 1440x1440 resolution, my subsequent processing becomes inefficient. For example, my latter operations are adding an alpha channel:

alpha = np.full((img.shape[0], img.shape[1]), 255, dtype=np.uint8)
img = np.dstack((img, alpha))

This operation takes 2-3 ms for an image of 224x224, but 40 ms for 1440x1440, making it a major bottleneck.

Questions:

  1. How can I efficiently extract the resized 224x224 frame that the inference model already operates on?
  2. Is there a way to access the preprocessed image directly instead of the original input size?
  3. Are there any special DeepStream/Cuda functions that will make it efficient to operate on arrays instead of using numpy which I believe uses CPU instead of using GPU?

Does anyone have an idea how this might be solved? I found such implementation of extracting frame in numpy array format,

however this is the same thing that i do. This extracts data in shape that is before the inference: 1440x1440 but not after inference which should be 224x224.

You can only get that from the nvinfer plugin source code. The image from the probe function is always the original dimension.

You can learn how to use cv-cuda to process the image data.

@yuweiw Thank you for your response!

Could you provide an example of how to extract a frame from the nvinfer plugin? Would it be possible to retrieve the frame using a pad probe function attached to the src pad of nvinfer, adding some parameter to nvinfer configuration or I will have to modify the nvinfer plugin?

Additionally, I came across some code utilizing pyds.NvDsObjEncOutParams, such as in this example: deepstream_python_apps/tests/integration/test.py at cb7fd9c8aa012178527e0cb84f91d1f5a0ad37ff · NVIDIA-AI-IOT/deepstream_python_apps · GitHub
However, this implementation relies on frame_user_meta_list.

In my case, my classifier model stores data inside classifier_meta_list. Here’s my relevant code snippet:

class_meta_list = obj_meta.classifier_meta_list
while class_meta_list is not None:
    try:
        classifier_meta = pyds.NvDsClassifierMeta.cast(class_meta_list.data)
    except StopIteration:
        break

    label_info_list = classifier_meta.label_info_list
    while label_info_list is not None:
        try:
            label_info = pyds.NvDsLabelInfo.cast(label_info_list.data)
        except StopIteration:
            break

        fenc_output = pyds.NvDsObjEncOutParams.cast(label_info_list.data)
        foutput = fenc_output.outBuffer()
        assert foutput is not None
        foutput.tofile(f"temp/{frame_num}224.jpg)

When printing fenc_output, it does not appear to be a processed frame. It’s shape is (6,) and it looks like this: [128 209 1 12 255 255] which is not even an original frame.

Is this because NvDsObjEncOutParams should only be used with frame_user_meta_list? If so, what would be the recommended approach to extract the processed frame from nvinfer when using a classifier model?

You can refer to this FAQ to learn how to dump the image in the nvinfer.
We recommend that you first read the source code of the nvinfer briefly, you can dump the image data after any processing step. The diagram of the source code is diagram.

No. It’s binded to this metadata NVDS_CROP_IMAGE_META. You can refer to our C/C++ sample deepstream-image-meta-test.

@yuweiw Thank you for your response.

Could you clarify whether extracting frames requires modifying the nvinfer plugin itself, or if it can be done purely through metadata handling?

I am implementing my DeepStream pipeline in Python and would like to confirm if adding output-tensor-meta=1 to my model config file is necessary. Currently, when I retrieve obj_user_meta_list and process it, the output from:

enc_output = pyds.NvDsObjEncOutParams.cast(user_meta.user_meta_data)

is always None, even though obj_meta_list contains data, and classifier_meta.label_info_list correctly holds classification results.

For a classification model, is setting output-tensor-meta=1 required, or is it unnecessary?

Additionally, would you recommend switching to C++ for handling this, or is it fully achievable in Python?