From nvBufSurface to gpu::Mat or cuda stream

Please provide complete information as applicable to your setup.

• Hardware Platform (Jetson / GPU) GPU
• DeepStream Version 7.1
• JetPack Version (valid for Jetson only)
• TensorRT Version 10.6
• NVIDIA GPU Driver Version (valid for GPU only) 560.30.35
• Issue Type( questions, new requirements, bugs) Questions

After DeepStream-App decoding frames and infering bboxes, I need to run custom cnn models with those frames and bboxes.

Raw frames are in GPU and input patches of custom cnn model needed to be in gpu mem. However I don’t want to do unnecessary memory copy gpu->cpu , cpu->gpu.

So, I am trying access frame data directly through NvBufSurface.

I checked that

  • EglImage supported by NVBUF_MEM_SURFACE_ARRAY only.
  • To access data point, NvBufSurfaceMap requires.
  • Only NVBUF_MEM_CUDA_UNIFIED supports NvBufSurfaceMap for dGPU.
  • The Color type of raw frame is NVBUF_COLOR_FORMAT_NV12_709.

I want to crop patches, resize, change color foramt to BGR without mem copy.

      // cv::Mat nv12_mat = cv::Mat(height * 3 / 2, width, CV_8UC1, surface->surfaceList[frame_meta->batch_id].mappedAddr.addr[0], surface->surfaceList[frame_meta->batch_id].pitch);
      // cv::Mat rgba_mat;
      // cv::cvtColor(nv12_mat, rgba_mat, cv::COLOR_YUV2BGRA_NV12);
      // mat_vector->push_back(nv12_mat);

copying cpu<-> gpu takes too much cpu usage. ( all cores were 99% in my case)

I read few posts in forums, they said use CUDA stream or cv::cuda::GpuMat or something…
I tried… but it didn’t work well.

      cv::cuda::GpuMat nv12_mat = cv::cuda::GpuMat(height * 3 / 2, width, CV_8UC1, surface->surfaceList[frame_meta->batch_id].mappedAddr.addr[0], surface->surfaceList[frame_meta->batch_id].pitch);
      cv::cuda::GpuMat rgb_mat;
      cv::cuda::cvtColor(nv12_mat, rgb_mat, cv::COLOR_YUV2RGB_NV12);

Is there any efficient way to access frame data in gpu memory that I can crop/resize and feed them to tensorrt engine ?

Thank you.

The NV12 format in NvBufSurface is different to the standard NV12 format. The NvBufSurface NV12 format layout is available from the NvBufSurfaceParams(NVIDIA DeepStream SDK API Reference: NvBufSurfaceParams Struct Reference | NVIDIA Docs) in NvBufSurface.

If you need to use the NV12 format, please copy the data line by line to the new cuda buffer according to the layout.

If you just want to convert the NV12 data to RGB format, please use nvvideoconvert plugin in your pipeline.

Thank you for quick reply!

I’m using deepstream_parallel_infer_app.cpp in [ NVIDIA-AI-IOT / deepstream_reference_apps].

Could you please give me an example how to add nvvideoconver plugin in body_pose_gie_src_pad_buffer_probe() function?

The bboxes are stored in the metadata after body_pose_gie_src_pad_buffer_probe(). You can get the metadata in downstream with the NvBufSurface.

You can add nvvideoconvert after nvdsosd and get the NvBufSurface after nvvideoconvert. The bboxes in metadata is also available.

I’m really sorry, but I couldn’t fully understand what you were explaining.

What should I do to add nvvideoconvert in original code deepstream-parallel-infer/deepstream_parallel_infer_app.cpp ?

There is the pipeline graph in deepstream_reference_apps/deepstream_parallel_inference_app/common.png at master · NVIDIA-AI-IOT/deepstream_reference_apps
You can modify the code to add nvvideoconvert after nvdsosd to convert the video to RGBA format. Then you can add probe function after nvvideoconvert and read the batched NvBufSurface out, the metadata with bboxes is also available here. The layout of RGB data is quite aligned to the standard RGB format.

From what I understand from your post, you want to:

  1. Copy the bitmap from Gst-Buffer to an OpenCV Cuda Matrix
  2. Run some of your own custom code using this Matrix

This is fairly easy to achieve on dGPU

cv::cuda::GpuMat mat(
  surf->surfaceList->height,
  surf->surfaceList->pitch / 4,
  CV_8UC4,
  surf->surfaceList->dataPtr,
  surf->surfaceList->pitch
);

This will give you the matrix in RGBA format. This is a zero-copy operation, therefore it uses the same memory as your deepstream application. If you want process this asynchronously with the gstreamer pipeline, it still might be beneficial to copy it to another memory. Otherwise, memory hazards are sure to follow.

This expects the dataflow to be in RGBA format, it is easily achievable by adding a convert element before your data extraction code.

gst-launch-1.0 ... ! nvvideoconvert ! 'video/x-raw(memory:NVMM),format=RGBA' ! ...

Hi! Thanks for your advice!

Yes you’e right. I want to infer custom trt model which not compaitable with Deepstream-app.
I need to crop patches, resize them and permuate to BCHW.
Thus, I need Mat type, RGB color format to process what I want.

Last time I checked, gputMat seems to be YUV foramt.

gpumat mat;
...
cv::Mat cpumat
mat.download(cpumat);
cv::imwrite(cpumat)

Copy gpumat to cpu and save in files then I got 1920,1620 shape of gray image of 1920,1080 above and two black and whit stripe image bloew. It looks like YUV color format.
I am not sure what is the color type of my gpuMat is actually.

Since I’m not familiar with Deepstream and GStream, I am struggling with adding nvvideconver in the sourcre code deepstrea-app.cpp.

Thank you !

Can you tell us more details about your model? Why is it not compatible with DeepStream?

Let me correct what I said.
It’s not that deepstream is uncompaitable with trt model, but rather than I don’t know how to use it properly.
So I thouhgt the fastrest and easiest way would be to extract raw frame dta and apply it to the exsiting custom inferance class (including reprocessing and infering).

The modles are detector which gives quad-form box and vit-based which gives embedding vector and cspnet based body pose which gives heatmaps.

Is it correct to add those commands in main() in cpp ?

// exisitng codes----------------------------------------------------------

  gst_bin_add (GST_BIN (pipeline->pipeline), instance_bin->sink_tee);
  NVGSTDS_LINK_ELEMENT (last_elem, instance_bin->sink_tee);
  last_elem = instance_bin->sink_tee;

  if (!create_sink_bin (config->num_sink_sub_bins,
        config->sink_bin_sub_bin_config, &instance_bin->sink_bin, 0)) {
    g_print ("creating sink bin failed\n");
    goto done;
  }
  //x264enc will output one buffer after input 66 buffers at default, enable zerolatency property.
  for(int i = 0; i < config->num_sink_sub_bins; i++){
      if(config->sink_bin_sub_bin_config[i].encoder_config.enc_type == NV_DS_ENCODER_TYPE_SW)
        g_object_set (G_OBJECT (instance_bin->sink_bin.sub_bins[i].encoder), "tune", 0x4, NULL);
  }
  gst_bin_add (GST_BIN (pipeline->pipeline), instance_bin->sink_bin.bin);
  NVGSTDS_LINK_ELEMENT (last_elem, instance_bin->sink_bin.bin);


// added ----------------------------------------------------------

GstElement *nvvidconv_post_osd, *capsfilter_post_conv;

// Add the elements to the pipeline
nvvidconv_post_osd = gst_element_factory_make("nvvideoconvert", "nvvidconv_post_osd");
capsfilter_post_conv = gst_element_factory_make("capsfilter", "capsfilter_post_conv");

if (!nvvidconv_post_osd || !capsfilter_post_conv) {
    g_printerr("Failed to create nvvideoconvert or capsfilter elements\n");
    return -1;
}

// Set caps for the capsfilter
GstCaps *caps = gst_caps_new_simple("video/x-raw",
                                     "format", G_TYPE_STRING, "RGBA",
                                     NULL);
g_object_set(G_OBJECT(capsfilter_post_conv), "caps", caps, NULL);
gst_caps_unref(caps);

// Add elements to the pipeline
gst_bin_add_many(GST_BIN(pipeline), nvvidconv_post_osd, capsfilter_post_conv, NULL);

// Link the elements
if (!gst_element_link_many(osd, nvvidconv_post_osd, capsfilter_post_conv, NULL)) {
    g_printerr("Failed to link osd to nvvideoconvert and capsfilter\n");
    return -1;
}

// Continue linking capsfilter_post_conv to the rest of the pipeline
if (!gst_element_link(capsfilter_post_conv, sink)) {
    g_printerr("Failed to link capsfilter_post_conv to sink\n");
    return -1;
}

something like this?
adding nvvideoconver after tee?

Thank you!

We already have lots of samples of integrating such models with DeepStream. Please refer to NVIDIA-AI-IOT/deepstream_tao_apps: Sample apps to demonstrate how to deploy models trained with TAO on DeepStream

To extract the buffer data and do the inferencing outside the pipeline will not be easier. It also may break the pipeline’s flow if you do not handle the buffer in time.

Such caps will copy the data from GPU to CPU, is this what you want?

Thank you for your quick reply!

sorry my mistake. I meant,

GstCaps *caps = gst_caps_new_simple("video/x-raw(memoery:NVMM)",
                                     "format", G_TYPE_STRING, "RGBA",
                                     NULL);

I will take a look TAO examples carefully.

I have few quetions about TAO-apps

  1. my pose estimation requires extra margin of bbox. Is it possible to modifiy cropping area from pgie to sgie ?

  2. In TAO apps, is it possible to run multiple nvinfer in parallel like deepstream-parallel-apps? I chekced few apps but all examples seems to have single nviinfer pipeline.

  3. I have 5 cnn layer mlp model. I think the mlp model is not fit any models in tao-apps. Is it possible to run the mlp models in TAO apps?

Thank you!

Besides questions about TAO apps,
I was able to add nvvideoconvert and extract RGBA form from the pipeline!


  nvvidconv = gst_element_factory_make("nvvideoconvert", "nvvideo_convert");
  if (!nvvidconv){
    g_print("Failed to create 'nvvideoconvert'\n");
    goto done;
    }

  capsfilter = gst_element_factory_make("capsfilter", "nvvidconv_caps");
    if (!capsfilter) {
        g_print("Failed to create 'capsfilter'\n");
        goto done;
    }

    caps = gst_caps_from_string("video/x-raw(memory:NVMM), format=RGBA");
    g_object_set(G_OBJECT(nvvidconv), "nvbuf-memory-type", 3, NULL);
    g_object_set(G_OBJECT(capsfilter), "caps", caps, NULL);
    gst_caps_unref(caps);

    gst_bin_add_many(GST_BIN(pipeline->pipeline), nvvidconv, capsfilter, NULL);

    nvvidconv_src_pad = gst_element_get_static_pad(nvvidconv, "src");
    gst_pad_add_probe(nvvidconv_src_pad, GST_PAD_PROBE_TYPE_BUFFER, frame_probe, NULL, NULL);
    gst_object_unref(nvvidconv_src_pad);

    NVGSTDS_LINK_ELEMENT(last_elem, nvvidconv);
    NVGSTDS_LINK_ELEMENT(nvvidconv, capsfilter);
    last_elem = capsfilter;
 //--------------------------------------------------------------------

Nvvideoconv is added between nvinfer and tiled.

Thank you for your advice! I really appreciate it.

It is possible. Depends on your margin algorithm

TAO apps and deepstream-parallel-app are all samples, the purpose of the samples is to show how to use the DeepStream APIs. You can refer to TAO apps for how to integrate different models with DeepStream nvinfer, nvinferserver, nvpreprocess,… And the models can also be deployed with deepstream-parallel-app by the same method.

DeepStream SDK only support ONNX models. You may need to convert your models to ONNX models.

Thank you for your advice!

I will study TAO apps and apply them to my code.

Thank you!!

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.