How to out-of-place buffering using dsexample/gstnvsegvisual demo code

I’m trying to modify the dsexample code to perform out-of-place buffering but I’m getting segmentation faults when accessing the surface derived from the output buffer in my transform function.
Below is a graphic of my pipeline:

Full pipeline:
"filesrc location=input.mp4 ! qtdemux ! h264parse ! nvv4l2decoder ! "
“m.sink_0 nvstreammux name=m batch-size=1 width=1920 height=1080 enable-padding=0 ! tee name=t ! queue ! nvinfer "
“config-file-path= config_infer_primary_yoloV4.txt ! nvvideoconvert ! nvdsosd ! nvvideoconvert ! nvv4l2h264enc ! h264parse ! mp4mux ! filesink location=result_1.mp4”
" t. ! queue ! pertransform ! nvinfer config-file-path= config_infer_primary_deeplab.txt ! nvvideoconvert ! nvdsosd ! nvvideoconvert ! nvv4l2h264enc ! h264parse ! mp4mux ! filesink location=result_2.mp4”

My intention is to perform some transformation operations in the pertransform element and output in a new buffer as the transformation is only used by my segmentation model and not my object detection model.

I think I’m missing some steps related to performing this task but as there is no dsexample for out-of-place buffering I’m lost. Here is my transform function:

static GstFlowReturn

gst_pertransform_transform (GstBaseTransform * btrans, GstBuffer * inbuf, GstBuffer * outbuf)

{

GstPerTransform *pertransform = GST_PERTRANSFORM (btrans);

GstMapInfo in_map_info;

GstMapInfo out_map_info;

GstFlowReturn flow_ret = GST_FLOW_ERROR;

PerTransformOutput *output;

NvBufSurface *surface = NULL;

NvBufSurface *dest_surface = NULL;

NvDsBatchMeta *batch_meta = NULL;

NvDsFrameMeta *frame_meta = NULL;

NvDsMetaList * l_frame = NULL;

pertransform->frame_num++;

CHECK_CUDA_STATUS (cudaSetDevice (pertransform->gpu_id),

  "Unable to set cuda device");

memset (&in_map_info, 0, sizeof (in_map_info));

if (!gst_buffer_map (inbuf, &in_map_info, GST_MAP_READ)) {

g_print ("Error: Failed to map input gst buffer\n");

goto error;

}

memset (&out_map_info, 0, sizeof (out_map_info));

if (!gst_buffer_map (outbuf, &out_map_info, GST_MAP_WRITE )) {

g_print ("Error: Failed to map output gst buffer\n");

goto error;

}

dest_surface = (NvBufSurface *)out_map_info.data;

gst_buffer_unmap (outbuf, &out_map_info);

if (!gst_buffer_copy_into (outbuf, inbuf, GST_BUFFER_COPY_META , 0, -1)) {

g_print ("Error: Buffer metadata copy failed \n");

}

nvds_set_input_system_timestamp (inbuf, GST_ELEMENT_NAME (pertransform));

surface = (NvBufSurface *) in_map_info.data;

GST_DEBUG_OBJECT (pertransform,

  "Processing Frame %" G_GUINT64_FORMAT " Surface %p\n",

  pertransform->frame_num, surface);

if (CHECK_NVDS_MEMORY_AND_GPUID (pertransform, surface))

goto error;

batch_meta = gst_buffer_get_nvds_batch_meta (inbuf);

if (batch_meta == nullptr) {

GST_ELEMENT_ERROR (pertransform, STREAM, FAILED,

    ("NvDsBatchMeta not found for input buffer."), (NULL));

return GST_FLOW_ERROR;

}

for (l_frame = batch_meta->frame_meta_list; l_frame != NULL;

  l_frame = l_frame->next)

{

  frame_meta = (NvDsFrameMeta *) (l_frame->data);

  cv::Mat in_mat ;

  GST_DEBUG("mvo tick 1");

  /* Map the buffer so that it can be accessed by CPU */

  if (surface->surfaceList[frame_meta->batch_id].mappedAddr.addr[0] == NULL){

    if (NvBufSurfaceMap (surface, frame_meta->batch_id, 0, NVBUF_MAP_READ_WRITE) != 0){

      GST_ELEMENT_ERROR (pertransform, STREAM, FAILED,

          ("%s:buffer map to be accessed by CPU failed", __func__), (NULL));

      return GST_FLOW_ERROR;

    }

  }


  /* Map the output buffer so that it can be accessed by CPU */

  if (dest_surface->surfaceList[frame_meta->batch_id].mappedAddr.addr[0] == NULL){


    if (NvBufSurfaceMap (dest_surface, frame_meta->batch_id, 0, NVBUF_MAP_READ_WRITE) != 0){

      GST_ELEMENT_ERROR (pertransform, STREAM, FAILED,

          ("%s:dest_buffer map to be accessed by CPU failed", __func__), (NULL));

      return GST_FLOW_ERROR;

    }

  }


  /* Cache the mapped data for CPU access */

  NvBufSurfaceSyncForCpu (surface, frame_meta->batch_id, 0);

  NvBufSurfaceSyncForCpu (dest_surface, frame_meta->batch_id, 0);


  in_mat  =

      cv::Mat (surface->surfaceList[frame_meta->batch_id].planeParams.height[0],

      surface->surfaceList[frame_meta->batch_id].planeParams.width[0], CV_8UC4,

      surface->surfaceList[frame_meta->batch_id].mappedAddr.addr[0],

      surface->surfaceList[frame_meta->batch_id].planeParams.pitch[0]);

  

  /* gaussian blur the frame using opencv */

  if (blur_frame (pertransform, frame_meta->batch_id, in_mat) != GST_FLOW_OK) {

  /* Error in blurring, skip processing on object. */

    GST_ELEMENT_ERROR (pertransform, STREAM, FAILED,

    ("blurring the object failed"), (NULL));

    if (NvBufSurfaceUnMap (surface, frame_meta->batch_id, 0)){

      GST_ELEMENT_ERROR (pertransform, STREAM, FAILED,

        ("%s:buffer unmap to be accessed by CPU failed", __func__), (NULL));

    }

    return GST_FLOW_ERROR;

  }

}

NvBufSurfaceUnMap (surface, -1, -1);

NvBufSurfaceUnMap (dest_surface, -1, -1);

flow_ret = GST_FLOW_OK;

error:

nvds_set_output_system_timestamp (inbuf, GST_ELEMENT_NAME (pertransform));

gst_buffer_unmap (inbuf, &in_map_info);

return flow_ret;

}

The segmentation fault occurs when trying to access dest_surface->surfaceList.
I’m not sure which other functions of GstBaseTransform need to be implemented (I implemented prepare_output_buffer).
Some extension of the dsexample code showing how to perform non in-place buffering would be great but until then… could you give me some clues on the steps needed to perform my goal? I can send the full plugin code if required.

Environment

TensorRT Version : 7.1.3
GPU Type : agx xavier
CUDA Version : 10.2
CUDNN Version : 8.0
Operating System + Version : Ubuntu 18.04
Baremetal or Container (if container which image + tag) : baremetal

What kind of transformation do you need?

Thanks for the quick reply. I’m doing some basic gaussian blur for now to test if its working but will replace with perspective transform after this step.

So there is no caps change, right?
If there is no caps change, in-place transform is enough. https://gstreamer.freedesktop.org/documentation/base/gstbasetransform.html?gi-language=c

Are you sure about that?
I’m using a GStreamer Tee element in my pipeline to run 2 nvinfer elements in parallel and from reading the documentation after the tee element further elements get get a reference to the same buffer (e.g nvinfer) and from debugging I also saw that the object detection model received transformed data sometimes which I want to avoid.
I want to perform object detection on the original input frame and segmentation on the transformed frame (perspective transform). I also need to publish both images so I don’t see a way to do that without generating a new output buffer.

OK, I understand your requirement.
One possible solution is to run two pipelines, one for detection and the other is for segmentation. So that the in-place transformation will not impact other pipeline.
The other solution is to develop your own normal transform plugin, Qustion of memory leak gst-plugin based on dsexample - Intelligent Video Analytics / DeepStream SDK - NVIDIA Developer Forums is a good sample. You need to design your prepare_output function carefully.

Thanks for the advice Fiona, I need to run it in a single pipeline so I will take a look at the sample code you linked me and try to implement.