[deepstream-python] how to blur object using nvbufsurface

Please provide complete information as applicable to your setup.

• Hardware Platform (Jetson / GPU) GPU
• DeepStream Version 6.2
• JetPack Version (valid for Jetson only)
• TensorRT Version
• NVIDIA GPU Driver Version (valid for GPU only)
• Issue Type( questions, new requirements, bugs)
• How to reproduce the issue ? (This is for bugs. Including which sample app is using, the configuration files content, the command line used and other details for reproducing)
• Requirement details( This is for new requirement. Including the module name-for which plugin or for which sample application, the function description)

The code structure is roughly like this
I want to blur each object with opencv through the pyarray received through get_nvds_buf_surface.
A segfault message appears as if the memory access was incorrect. (Uncomment line # frame_copy = cv2.cvtColor(image_array, cv2.COLOR_RGBA2BGR))
How should I write the code?
I’ve looked at various examples, but I can’t figure out how to approach it.

def blur_obj_pad_buffer_probe(buffer, user_data):
    # Retrieve batch metadata from the gst_buffer
    # Note that pyds.gst_buffer_get_nvds_batch_meta() expects the
    # C address of gst_buffer as input, which is obtained with hash(gst_buffer)
    batch_meta = pyds.gst_buffer_get_nvds_batch_meta(buffer)
    l_frame = batch_meta.frame_meta_list
    while l_frame is not None:
            # Note that l_frame.data needs a cast to pyds.NvDsFrameMeta
            # The casting also keeps ownership of the underlying memory
            # in the C code, so the Python garbage collector will leave
            # it alone.
            frame_meta = pyds.NvDsFrameMeta.cast(l_frame.data)
        except StopIteration:
        # image_array: RGBA
        image_array = pyds.get_nvds_buf_surface(buffer, frame_meta.batch_id)
        # frame_copy = cv2.cvtColor(image_array, cv2.COLOR_RGBA2BGR)
        l_obj = frame_meta.obj_meta_list
        while l_obj is not None:
                # Casting l_obj.data to pyds.NvDsObjectMeta
                obj_meta = pyds.NvDsObjectMeta.cast(l_obj.data)
            except StopIteration:
            # crop image
            x1 = int(obj_meta.rect_params.left)
            y1 = int(obj_meta.rect_params.top)
            x2 = int(obj_meta.rect_params.left + obj_meta.rect_params.width)
            y2 = int(obj_meta.rect_params.top + obj_meta.rect_params.height)
                l_obj = l_obj.next
            except StopIteration:
        if is_aarch64(): # If Jetson, since the buffer is mapped to CPU for retrieval, it must also be unmapped 
            pyds.unmap_nvds_buf_surface(buffer, frame_meta.batch_id) # The unmap call should be made after operations with the original array are complete.
                                                                     #  The original array cannot be accessed after this call.
            l_frame = l_frame.next
        except StopIteration:
    return DSL_PAD_PROBE_OK

Thank you

There are several options for you requestment.

  1. Use dsexample plugin. You also can modify this sample to python. Just add dsexample to your pipeline.

2.Refer the following python apps.They are to process nvbufsurface with opencv/cuda.


3.In addition, I have written a sample according to your idea.This approach should have the worst performance

deepstream_test_1_blur.py (9.7 KB)

1 Like

Thank you
But what I’m curious about is why you say number 3 is the worst performance?
Isn’t it simply a python binding type? I think the performance is almost similar to C++.

The reason it didn’t work was because the Nvbuf-mem-type setting was missing from the capsfilter side.

Because nvbufsurface holds the buffer from the GPU, using OpenCV means copying from the GPU to the CPU

Oh, I have an additional question.

  1. I use the unified memory type on x86 to blur from pad_probe to opencv. Do you think cpu copy will eventually occur in this case as well? Then, my thought is that ds-example also appears to operate in a similar unified memory type when looking at the code.

  2. It seems that CPU copy is unavoidable in Jetson (not support unified memory), so there will definitely be a difference in calculation speed compared to x86, right?

There are two modes in ds-example . If you use implement CUDA model, no need copy.Of course, using cpu mode is the same as python

Because jeston is SOC, so CPU and GPU access each other. The performance difference depends on Jetson weaker GPU performance

Thank you for quick response
Let me check one more thing.
You mentioned below

There are two modes in ds-example . If you use implement CUDA model, no need copy.Of course, using cpu mode is the same as python

However, the ds-example plugin does not have a separate Cuda mode, and optimize or basic modes seem to be supported. The cuda you mentioned refers to Gpu::mat in the link above?

Thank you

Yes, that’s true

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.