From nvBufSurface to gpu::Mat or cuda stream

peteryoon · January 10, 2025, 7:14am

Please provide complete information as applicable to your setup.

• Hardware Platform (Jetson / GPU) GPU
• DeepStream Version 7.1
• JetPack Version (valid for Jetson only)
• TensorRT Version 10.6
• NVIDIA GPU Driver Version (valid for GPU only) 560.30.35
• Issue Type( questions, new requirements, bugs) Questions

After DeepStream-App decoding frames and infering bboxes, I need to run custom cnn models with those frames and bboxes.

Raw frames are in GPU and input patches of custom cnn model needed to be in gpu mem. However I don’t want to do unnecessary memory copy gpu->cpu , cpu->gpu.

So, I am trying access frame data directly through NvBufSurface.

I checked that

EglImage supported by NVBUF_MEM_SURFACE_ARRAY only.
To access data point, NvBufSurfaceMap requires.
Only NVBUF_MEM_CUDA_UNIFIED supports NvBufSurfaceMap for dGPU.
The Color type of raw frame is NVBUF_COLOR_FORMAT_NV12_709.

I want to crop patches, resize, change color foramt to BGR without mem copy.

      // cv::Mat nv12_mat = cv::Mat(height * 3 / 2, width, CV_8UC1, surface->surfaceList[frame_meta->batch_id].mappedAddr.addr[0], surface->surfaceList[frame_meta->batch_id].pitch);
      // cv::Mat rgba_mat;
      // cv::cvtColor(nv12_mat, rgba_mat, cv::COLOR_YUV2BGRA_NV12);
      // mat_vector->push_back(nv12_mat);

copying cpu<-> gpu takes too much cpu usage. ( all cores were 99% in my case)

I read few posts in forums, they said use CUDA stream or cv::cuda::GpuMat or something…
I tried… but it didn’t work well.

      cv::cuda::GpuMat nv12_mat = cv::cuda::GpuMat(height * 3 / 2, width, CV_8UC1, surface->surfaceList[frame_meta->batch_id].mappedAddr.addr[0], surface->surfaceList[frame_meta->batch_id].pitch);
      cv::cuda::GpuMat rgb_mat;
      cv::cuda::cvtColor(nv12_mat, rgb_mat, cv::COLOR_YUV2RGB_NV12);

Is there any efficient way to access frame data in gpu memory that I can crop/resize and feed them to tensorrt engine ?

Thank you.

Fiona.Chen · January 10, 2025, 8:52am

The NV12 format in NvBufSurface is different to the standard NV12 format. The NvBufSurface NV12 format layout is available from the NvBufSurfaceParams(NVIDIA DeepStream SDK API Reference: NvBufSurfaceParams Struct Reference | NVIDIA Docs) in NvBufSurface.

If you need to use the NV12 format, please copy the data line by line to the new cuda buffer according to the layout.

If you just want to convert the NV12 data to RGB format, please use nvvideoconvert plugin in your pipeline.

peteryoon · January 10, 2025, 9:08am

Thank you for quick reply!

I’m using deepstream_parallel_infer_app.cpp in [ NVIDIA-AI-IOT / deepstream_reference_apps].

Could you please give me an example how to add nvvideoconver plugin in body_pose_gie_src_pad_buffer_probe() function?

Fiona.Chen · January 10, 2025, 9:30am

The bboxes are stored in the metadata after body_pose_gie_src_pad_buffer_probe(). You can get the metadata in downstream with the NvBufSurface.

You can add nvvideoconvert after nvdsosd and get the NvBufSurface after nvvideoconvert. The bboxes in metadata is also available.

peteryoon · January 10, 2025, 10:04am

I’m really sorry, but I couldn’t fully understand what you were explaining.

What should I do to add nvvideoconvert in original code deepstream-parallel-infer/deepstream_parallel_infer_app.cpp ?

Fiona.Chen · January 10, 2025, 10:16am

There is the pipeline graph in deepstream_reference_apps/deepstream_parallel_inference_app/common.png at master · NVIDIA-AI-IOT/deepstream_reference_apps
You can modify the code to add nvvideoconvert after nvdsosd to convert the video to RGBA format. Then you can add probe function after nvvideoconvert and read the batched NvBufSurface out, the metadata with bboxes is also available here. The layout of RGB data is quite aligned to the standard RGB format.

zetxy89 · January 10, 2025, 12:01pm

From what I understand from your post, you want to:

Copy the bitmap from Gst-Buffer to an OpenCV Cuda Matrix
Run some of your own custom code using this Matrix

This is fairly easy to achieve on dGPU

cv::cuda::GpuMat mat(
  surf->surfaceList->height,
  surf->surfaceList->pitch / 4,
  CV_8UC4,
  surf->surfaceList->dataPtr,
  surf->surfaceList->pitch
);

This will give you the matrix in RGBA format. This is a zero-copy operation, therefore it uses the same memory as your deepstream application. If you want process this asynchronously with the gstreamer pipeline, it still might be beneficial to copy it to another memory. Otherwise, memory hazards are sure to follow.

This expects the dataflow to be in RGBA format, it is easily achievable by adding a convert element before your data extraction code.

gst-launch-1.0 ... ! nvvideoconvert ! 'video/x-raw(memory:NVMM),format=RGBA' ! ...

peteryoon · January 12, 2025, 11:54pm

Hi! Thanks for your advice!

Yes you’e right. I want to infer custom trt model which not compaitable with Deepstream-app.
I need to crop patches, resize them and permuate to BCHW.
Thus, I need Mat type, RGB color format to process what I want.

Last time I checked, gputMat seems to be YUV foramt.

gpumat mat;
...
cv::Mat cpumat
mat.download(cpumat);
cv::imwrite(cpumat)

Copy gpumat to cpu and save in files then I got 1920,1620 shape of gray image of 1920,1080 above and two black and whit stripe image bloew. It looks like YUV color format.
I am not sure what is the color type of my gpuMat is actually.

Since I’m not familiar with Deepstream and GStream, I am struggling with adding nvvideconver in the sourcre code deepstrea-app.cpp.

Thank you !

Fiona.Chen · January 13, 2025, 1:43am

Can you tell us more details about your model? Why is it not compatible with DeepStream?

peteryoon · January 13, 2025, 2:18am

Let me correct what I said.
It’s not that deepstream is uncompaitable with trt model, but rather than I don’t know how to use it properly.
So I thouhgt the fastrest and easiest way would be to extract raw frame dta and apply it to the exsiting custom inferance class (including reprocessing and infering).

The modles are detector which gives quad-form box and vit-based which gives embedding vector and cspnet based body pose which gives heatmaps.

Is it correct to add those commands in main() in cpp ?

// exisitng codes----------------------------------------------------------

  gst_bin_add (GST_BIN (pipeline->pipeline), instance_bin->sink_tee);
  NVGSTDS_LINK_ELEMENT (last_elem, instance_bin->sink_tee);
  last_elem = instance_bin->sink_tee;

  if (!create_sink_bin (config->num_sink_sub_bins,
        config->sink_bin_sub_bin_config, &instance_bin->sink_bin, 0)) {
    g_print ("creating sink bin failed\n");
    goto done;
  }
  //x264enc will output one buffer after input 66 buffers at default, enable zerolatency property.
  for(int i = 0; i < config->num_sink_sub_bins; i++){
      if(config->sink_bin_sub_bin_config[i].encoder_config.enc_type == NV_DS_ENCODER_TYPE_SW)
        g_object_set (G_OBJECT (instance_bin->sink_bin.sub_bins[i].encoder), "tune", 0x4, NULL);
  }
  gst_bin_add (GST_BIN (pipeline->pipeline), instance_bin->sink_bin.bin);
  NVGSTDS_LINK_ELEMENT (last_elem, instance_bin->sink_bin.bin);


// added ----------------------------------------------------------

GstElement *nvvidconv_post_osd, *capsfilter_post_conv;

// Add the elements to the pipeline
nvvidconv_post_osd = gst_element_factory_make("nvvideoconvert", "nvvidconv_post_osd");
capsfilter_post_conv = gst_element_factory_make("capsfilter", "capsfilter_post_conv");

if (!nvvidconv_post_osd || !capsfilter_post_conv) {
    g_printerr("Failed to create nvvideoconvert or capsfilter elements\n");
    return -1;
}

// Set caps for the capsfilter
GstCaps *caps = gst_caps_new_simple("video/x-raw",
                                     "format", G_TYPE_STRING, "RGBA",
                                     NULL);
g_object_set(G_OBJECT(capsfilter_post_conv), "caps", caps, NULL);
gst_caps_unref(caps);

// Add elements to the pipeline
gst_bin_add_many(GST_BIN(pipeline), nvvidconv_post_osd, capsfilter_post_conv, NULL);

// Link the elements
if (!gst_element_link_many(osd, nvvidconv_post_osd, capsfilter_post_conv, NULL)) {
    g_printerr("Failed to link osd to nvvideoconvert and capsfilter\n");
    return -1;
}

// Continue linking capsfilter_post_conv to the rest of the pipeline
if (!gst_element_link(capsfilter_post_conv, sink)) {
    g_printerr("Failed to link capsfilter_post_conv to sink\n");
    return -1;
}

something like this?
adding nvvideoconver after tee?

Thank you!

Fiona.Chen · January 13, 2025, 2:44am

We already have lots of samples of integrating such models with DeepStream. Please refer to NVIDIA-AI-IOT/deepstream_tao_apps: Sample apps to demonstrate how to deploy models trained with TAO on DeepStream

To extract the buffer data and do the inferencing outside the pipeline will not be easier. It also may break the pipeline’s flow if you do not handle the buffer in time.

peteryoon:

GstCaps *caps = gst_caps_new_simple("video/x-raw",
                                     "format", G_TYPE_STRING, "RGBA",
                                     NULL);
g_object_set(G_OBJECT(capsfilter_post_conv), "caps", caps, NULL);

Such caps will copy the data from GPU to CPU, is this what you want?

peteryoon · January 13, 2025, 4:21am

Thank you for your quick reply!

sorry my mistake. I meant,

GstCaps *caps = gst_caps_new_simple("video/x-raw(memoery:NVMM)",
                                     "format", G_TYPE_STRING, "RGBA",
                                     NULL);

I will take a look TAO examples carefully.

I have few quetions about TAO-apps

my pose estimation requires extra margin of bbox. Is it possible to modifiy cropping area from pgie to sgie ?
In TAO apps, is it possible to run multiple nvinfer in parallel like deepstream-parallel-apps? I chekced few apps but all examples seems to have single nviinfer pipeline.
I have 5 cnn layer mlp model. I think the mlp model is not fit any models in tao-apps. Is it possible to run the mlp models in TAO apps?

Thank you!

peteryoon · January 14, 2025, 11:16am

Besides questions about TAO apps,
I was able to add nvvideoconvert and extract RGBA form from the pipeline!


  nvvidconv = gst_element_factory_make("nvvideoconvert", "nvvideo_convert");
  if (!nvvidconv){
    g_print("Failed to create 'nvvideoconvert'\n");
    goto done;
    }

  capsfilter = gst_element_factory_make("capsfilter", "nvvidconv_caps");
    if (!capsfilter) {
        g_print("Failed to create 'capsfilter'\n");
        goto done;
    }

    caps = gst_caps_from_string("video/x-raw(memory:NVMM), format=RGBA");
    g_object_set(G_OBJECT(nvvidconv), "nvbuf-memory-type", 3, NULL);
    g_object_set(G_OBJECT(capsfilter), "caps", caps, NULL);
    gst_caps_unref(caps);

    gst_bin_add_many(GST_BIN(pipeline->pipeline), nvvidconv, capsfilter, NULL);

    nvvidconv_src_pad = gst_element_get_static_pad(nvvidconv, "src");
    gst_pad_add_probe(nvvidconv_src_pad, GST_PAD_PROBE_TYPE_BUFFER, frame_probe, NULL, NULL);
    gst_object_unref(nvvidconv_src_pad);

    NVGSTDS_LINK_ELEMENT(last_elem, nvvidconv);
    NVGSTDS_LINK_ELEMENT(nvvidconv, capsfilter);
    last_elem = capsfilter;
 //--------------------------------------------------------------------

Nvvideoconv is added between nvinfer and tiled.

Thank you for your advice! I really appreciate it.

Fiona.Chen · January 16, 2025, 8:20am

It is possible. Depends on your margin algorithm

TAO apps and deepstream-parallel-app are all samples, the purpose of the samples is to show how to use the DeepStream APIs. You can refer to TAO apps for how to integrate different models with DeepStream nvinfer, nvinferserver, nvpreprocess,… And the models can also be deployed with deepstream-parallel-app by the same method.

DeepStream SDK only support ONNX models. You may need to convert your models to ONNX models.

peteryoon · January 23, 2025, 6:49am

Thank you for your advice!

I will study TAO apps and apply them to my code.

Thank you!!

system · February 6, 2025, 6:49am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Deepstream-app not running when more than 12 cams or 3 models DeepStream SDK deepstream	10	29	January 8, 2025
Cannot find the objectDetector_FastRCNN example DeepStream SDK deepstream	46	170	October 14, 2024
Deepstream DeepStream SDK gstreamer , jetson , deepstream	14	463	July 9, 2024
How can I extract images from surface objects if there is more than 1 stream src DeepStream SDK	18	805	December 20, 2023
How to NvBufSurface to cv::Mat? DeepStream SDK	12	472	July 2, 2024
How to Use an Image Fusion Model in DeepStream DeepStream SDK	14	474	June 5, 2024
How to add extra function in the deepstream like edge detection? DeepStream SDK opencv , jetson-inference , deepstream	32	141	October 9, 2024
Opencv mat clone osd frame coredump DeepStream SDK	17	470	March 15, 2024
OpenCV Mat to NvBufSurface (to use in NvBufSurfTransform) DeepStream SDK	16	3579	October 12, 2021
Issue with plotting on image in deepstream-app DeepStream SDK	33	2188	October 12, 2021

From nvBufSurface to gpu::Mat or cuda stream

Related topics