Using Cuda filters on cuda::GpuMat obtained from NvBufSurface

Amin_Parchami · July 12, 2021, 1:28pm

Please provide complete information as applicable to your setup.

• Hardware Platform (Jetson / GPU) Jetson
• DeepStream Version 5.1
• Issue Type( questions, new requirements, bugs) Question

Hi. I am currently working on developing a custom deepstream plugin in C++. I would like to obtain frames in the cv::cuda::GpuMat format and do some Cuda operations on them.
My initial code was from the sources/gst-plugins/gstdsexample on the SDK. However, in the code there is no instruction for using cv::cuda::GpuMat. That is why I used this code here. But here is the problem:

The code on the forum work with an NvBufSurface named “inter_buf”. The ‘inter_buf’ is an additional surface that is the output of a transformation with “NvBufSurfTransform” on the original NvBufSurface. The example (on the sdk) uses this transformation for cropping and resizing. I don’t want this. I want to use the original NvBufSurface without having an additional one and obtain GpuMat directly from it. So I won’t have any additional NvBufSurfTransform needed.

However, I cannot apply my cuda filter on the GpuMat obtained from this approach. I get Illegal memory access error.

Here is the key section of the code:

    if (NvBufSurfaceMapEglImage (input_buf, 0) !=0 ) {
        return GST_FLOW_ERROR;
    }
    CUresult status;
    CUeglFrame eglFrame;
    CUgraphicsResource pResource = NULL;
    cudaFree(0);
    status = cuGraphicsEGLRegisterImage(&pResource,
		input_buf->surfaceList[0].mappedAddr.eglImage,
                CU_GRAPHICS_MAP_RESOURCE_FLAGS_NONE);

    status = cuGraphicsResourceGetMappedEglFrame(&eglFrame, pResource, 0, 0);
    status = cuCtxSynchronize();

    cv::cuda::GpuMat d_mat(gpublur->processing_height, gpublur->processing_width, CV_8UC4, eglFrame.frame.pPitch[0]);

    //This lines gives error on runtime. The gpublur->filter is just a normal GaussianBlur from cudafilters.
    gpublur->filter->apply (d_mat, d_mat);

    status = cuCtxSynchronize();
    status = cuGraphicsUnregisterResource(pResource);

    // Destroy the EGLImage
    NvBufSurfaceUnMapEglImage (input_buf, 0);

This is very similar to the code on the link. Just that I have removed the transformations before and after this section (since I am working with the main surface).

And the error I get is:

terminate called after throwing an instance of 'cv::Exception'
  what():  OpenCV(4.5.1) /opt/nvidia/deepstream/deepstream-5.1/opencvcuda/opencv_contrib-4.5.1/modules/cudafilters/src/cuda/row_filter.hpp:172: error: (-217:Gpu API call) an illegal memory access was encountered in function 'caller'

Aborted (core dumped)

Once again, I am using the NvBufSurface directly from the original buffer.
This is how the in_buf was created:

  memset (&in_map_info, 0, sizeof (in_map_info));
  if (!gst_buffer_map (inbuf, &in_map_info, GST_MAP_READ)) {
    g_print ("Error: Failed to map gst buffer\n");
    goto error;
  }

  surface = (NvBufSurface *) in_map_info.data;

The code is almost identical to the one posted here. Is working with the original NvBufSurface causing the problem? Should I also use another NvBufSurface with two transformations? (one before and one after applying the cuda filter)

Amin_Parchami · July 12, 2021, 8:37pm

Ok. After spending about a day, I figured out the problem. Since others may come across the same issue, I try to explain it here. Please correct me if you see any misinformation.

When you decide to use cv::cuda::GpuMat, you assume that the initial data is from among the acceptable formats. However, the initial NvBufSurface (which is obtained from the input buffer), uses NV12 format. According to this post.
Therefore, it seems that we have no option but to go through the transform procedure.
So the code in the transform section should look like this:

Get NvBufSurface from the original buffer
Have an additional NvBufSurface as your element properties. Make sure to specify NVBUF_COLOR_FORMAT_RGBA
in the NvBufSurfaceCreateParams
Do a transformation from part 1 to 2 with NvBufSurfTransform. You can add some crop/scale/resize as well.
You can also just do the conversion for the sake of color format and have same rectangles.
Get the GpuMat as instructed in the above code or the code here.
When you are done with the mat, you can now do a reversed format conversion (from RGBA, RGB, etc. back to NV12) as it is done at the end of this code. You can use the same transformation config just swap the input and output surfaces and their rectangles (to match their size).

The transformations are both done either with GPU for dGPU or VIC for Jetson. I guess they do not add too much overhead.

Best.

EDIT: Thanks to Blard.Theophile’s post, you can also use nvvideconvert for the conversion. This way, the plugin does not have to use NvBufSurfaceTransform.

Blard.Theophile · July 13, 2021, 8:15am

Hi Mohammad!
If you want to directly process RGBA data in dsexample you can use nvvideoconvert before dsexample to convert the input buffers from NV12 to RGBA.

Amin_Parchami · July 13, 2021, 8:46am

Hey there,
I was just about to try this one for today :D

Thank you for your suggestion.

Blard.Theophile · July 13, 2021, 9:38am

The only drawback is that the buffers will flow as RGBA for the rest of the pipeline, but most of Deepstream elements support NV12 & RGBA anyway (except nvv4l2 encoders).

Amin_Parchami · July 13, 2021, 9:49am

Oh, I see.
But I think it gives a performance boost by making the conversion parallel with the plugin’s logic. (Since it is in a separate plugin).

Just one last thing, does NV12 have any advantage over the RGBA? I mean, I can convert it back to NV12 with another nvvideoconvert after the plugin, if it does.

Thank you for your detailed answers🙏🏻.

Blard.Theophile · July 13, 2021, 10:04am

I’m not aware of any significant advantage of NV12 over RGBA. I’d say it depends on your pipeline and the capabilities of your elements. Maybe someone at Nvidia can provide more information.

I’m almost always using RGBA, as it simpler to use with OpenCV, and the underlying neurals nets of nvinfer almost always expect RGB input.

Amin_Parchami · July 20, 2021, 7:36am

Hi again,

Thanks to your suggestion, I have implemented a simpler plugin that directly processes RGBA.

But the problem is that the cv::cuda::GaussianFilter is being applied MUCH slower! On a single image on a normal script (outside deepstream framework), the filter is applied with the same method but 5 times faster!

I think it is related to the buffer layout and the pitch linear layout is causing this. I would appreciate it if you have any input on this. I have created a topic here.

Topic		Replies	Views
How to create opencv gpumat from nvstream? DeepStream SDK	35	15610	July 19, 2021
How to convert NvBufSurface to cv::cuda::GpuMat? DeepStream SDK	3	1167	September 9, 2022
Cuda blurring filter running too slow on gstdsexample using GpuMat! DeepStream SDK opencv , cuda , gstreamer	7	1855	August 10, 2021
Deepstream DeepStream SDK gstreamer , jetson , deepstream	13	685	July 9, 2024
Using cv::cuda::GpuMat instead of cv::Mat in gstdsexample DeepStream SDK	7	3284	July 16, 2021
Deepstream on Jetson - getting cv::gpuMat from NvBufSurface DeepStream SDK	4	341	August 9, 2024
How to use pyds and opencv-python with cuda_GpuMat mapped with NvBufSurface DeepStream SDK opencv	5	764	November 9, 2022
Getting strange cv::Mat from NvBufSurface DeepStream SDK	9	1040	July 26, 2022
Deepstream nvdsvideotemplate custom process buffer for image processing DeepStream SDK cuda , jetson , deepstream	3	307	June 6, 2024
How can I create new NvBufSurface from cv::cuda::GpuMat with NVBUF_MEM_SURFACE_ARRAY DeepStream SDK deepstream	14	369	May 23, 2025

Using Cuda filters on cuda::GpuMat obtained from NvBufSurface

Related topics