How can I create new NvBufSurface from cv::cuda::GpuMat with NVBUF_MEM_SURFACE_ARRAY

rlaxodncjs · April 17, 2025, 9:02pm

• Hardware Platform (Jetson / GPU) Jetson
• DeepStream Version 6.3
• JetPack Version (valid for Jetson only) 5.1.2

Hello, I’m building custom element by using nvdsvideotemplate. In this element, I received a buffer from upstream. and do some processing using cv::cuda and I should make a new GstBuffer from this cv::cuda::GpuMat.
my output cv::cuda::GpuMat is output_nv12_mat

NvBufSurfaceCreateParams create_params = {0};
create_params.gpuId = 0;
create_params.width = m_video_width;
create_params.height = m_video_height;
create_params.colorFormat = NVBUF_COLOR_FORMAT_NV12;
create_params.layout = NVBUF_LAYOUT_PITCH;
create_params.memType = NVBUF_MEM_CUDA_UNIFIED;

new_surf->numFilled = 1;
        cudaMemcpy2DAsync(new_surf->surfaceList[0].dataPtr, new_surf->surfaceList[0].pitch,
                                  output_nv12_mat.data, output_nv12_mat.step, m_video_width,
                                  m_video_height, cudaMemcpyDeviceToDevice,
                                  m_config_params.cuda_stream);

        // Copy UV plane
        cudaMemcpy2DAsync(static_cast<uint8_t *>(new_surf->surfaceList[0].dataPtr) +
                                                                    new_surf->surfaceList[0].pitch * m_video_height,
                          new_surf->surfaceList[0].pitch,
                          output_nv12_mat.data + output_nv12_mat.step * m_video_height,
                          output_nv12_mat.step, m_video_width, m_video_height / 2,
                          cudaMemcpyDeviceToDevice, m_config_params.cuda_stream);
        cudaStreamSynchronize(m_config_params.cuda_stream);

NvBufSurfTransformSetSessionParams(&m_config_params);
GstBuffer *newGstOutBuf = NULL;
gst_buffer_pool_acquire_buffer(m_dsBufferPool, &newGstOutBuf, NULL);
NvBufSurface *out_surf = getNvBufSurface(newGstOutBuf);
out_surf->numFilled = new_surf->numFilled;
out_surf->batchSize = new_surf->batchSize;//
NvBufSurfTransformParams transform_params;
transform_params.transform_flag = NVBUFSURF_TRANSFORM_FILTER;
transform_params.transform_flip = NvBufSurfTransform_None;
transform_params.transform_filter = NvBufSurfTransformInter_Default;
NvBufSurfTransform(new_surf, out_surf, &transform_params);
NvBufSurfaceDestroy(new_surf);

This is working code, And after passing this buffer to encoder, I was able to save correct image.
But it seems to make a unnecessary memory copy because, I made NvBufSurface with NVBUF_MEM_CUDA_UNIFIED memory and convert this to NVBUF_MEM_SURFACE_ARRAY. And I set config params as below, So It seems to use GPU for this transform.

m_config_params.compute_mode = NvBufSurfTransformCompute_GPU;
m_config_params.gpu_id = params->m_gpuId;
m_config_params.cuda_stream = params->m_cudaStream;

So I tried to set NVBUF_MEM_SURFACE_ARRAY to create_params.memType and allocate output matrix to NvBufSurface by using below code.

m_egl_image = new_surf->surfaceList[0].mappedAddr.eglImage;

if (cuGraphicsEGLRegisterImage(&m_cuda_resource, m_egl_image,
                                       CU_GRAPHICS_MAP_RESOURCE_FLAGS_NONE) != CUDA_SUCCESS) {
        std::cerr << "Failed to register CUDA resource" << std::endl;
        return false;
}

if (cuGraphicsResourceGetMappedEglFrame(&m_egl_frame, m_cuda_resource, 0, 0) !=
            CUDA_SUCCESS) {
        std::cerr << "Failed to get mapped EGL frame" << std::endl;
        return false;
}

cv::cuda::GpuMat d_y(new_surf->surfaceList[0].height, new_surf->surfaceList[0].width, CV_8UC1, m_egl_frame.frame.pPitch[0]);
cv::cuda::GpuMat d_uv(new_surf->surfaceList[0].height / 2, new_surf->surfaceList[0].width, CV_8UC1, m_egl_frame.frame.pPitch[1]);

output_nv12_mat.copyTo(d_y);
output_nv12_map.copyTo(d_uv);

And When i parse cv::cuda::GpuMat again from this NvBufSurface and checked image, It looks okay. But after NvBufSurfTransform and attache GstBuffer and checked a video at the downstream, image was crashed.

So overall,

I’d like to avoid memory copy as much as possible. Can i directly make a GstBuffer from cv::cuda::GpuMat?

yuweiw · April 18, 2025, 2:46am

If you want to use the NVBUF_MEM_SURFACE_ARRAY type and cv::cuda::GpuMat, this inevitably leads to the process of memory copy on Jetson.
If you want to avoid memory copy, you should use the NVBUF_MEM_CUDA_UNIFIED type.

rlaxodncjs · April 18, 2025, 10:03am

Thanks for the answer!

Actually, my piepeline is briefly as below
nvarguscamerasrc → nvdsvideotemplate → tee → nvv4l2h264enc → filesink
→ tee → nvdsvideotemplate → nvv4l2h264enc → filesink.

And in each nvdsvideotemplate I need to handle video buffer.
But after nvarguscamerasrc, buffer is already on NVBUF_MEM_SURFACE_ARRAY and also, from my understanding, nvv4l2h264enc should receive NVBUF_MEM_SURFACE_ARRAY buffer. so that’s why I tried to keep my memory on NVBUF_MEM_SURFACE_ARRAY.

So to sum up my questions,

Is it possible to keep my all buffer in NVBUF_MEM_CUDA_UNIFIED for my pipeline setup?
If I should use NVBUF_MEM_SURFACE_ARRAY to keep my pipeline, Is there any way i can access pixel data of NVBUF_MEM_SURFACE_ARRAY not using cv::cuda::GpuMat?
I’m using CUDA kernel to modify buffer pixels,
My CUDA kernel function is like this.

__global__ void screenToVideoKernel(int params,
                                    cv::cuda::PtrStepSz<uint8_t> frame,
                                    cv::cuda::PtrStepSz<uint8_t> output)
{
    int x = blockIdx.x * blockDim.x + threadIdx.x;
    int y = blockIdx.y * blockDim.y + threadIdx.y;

    int maxVideoHeight = videoHeight / length;

    if (x >= videoWidth || y >= maxVideoHeight)
    {
        return;
    }

From my understanding, in Jetson, memory is shared physically between CPU and GPU and Jetson’s default memory is NVBUF_MEM_SURFACE_ARRAY. Then What does it mean when NVBUF_MEM_SURFACE_ARRAY and NVBUF_MEM_CUDA_UNIFIED are explicitly specified?

DaneLLL · April 21, 2025, 2:20am

Hi,
For information, do you use Xavier or Orin? Would like to confirm which platform you are using.

rlaxodncjs · April 21, 2025, 8:54am

Hello,
We are using Xavier

DaneLLL · April 22, 2025, 5:36am

Hi,
You can access NvBufSurface through the function calls like:
How to create opencv gpumat from nvstream? - #18 by DaneLLL

NvBufSurfaceMapEglImage();
cuGraphicsEGLRegisterImage();
cuGraphicsResourceGetMappedEglFrame();

_do_image_process_;

cuGraphicsUnregisterResource();
NvBufSurfaceUnMapEglImage();

The nvarguscamerasrc sends NvBufsurface in NV12 blocklinear to next plugin by default. You can customize the source source to send NV12 pitch linear. The source code is in

Jetson Linux 35.4.1 | NVIDIA Developer
Driver Package (BSP) Sources

rlaxodncjs · April 22, 2025, 11:21am

Hi Thanks for the answer.
I’m exactly doing like that

NvBufSurfaceMapEglImage();
cuGraphicsEGLRegisterImage();
cuGraphicsResourceGetMappedEglFrame();

_do_image_process_;
cv::cuda::GpuMat output_mat // This is my result output from image process.

cuGraphicsUnregisterResource();
NvBufSurfaceUnMapEglImage();

And with this output cv::cuda::GpuMat, I need to create a GstBuffer with NvBufSurface made on NVBUF_MEM_SURFACE_ARRAY to pass the data to downstream element.
NvBufSurfaceCreate with NVBUF_MEM_CUDA_UNIFIED and attach cv::cuda::GpuMat data to this by using cudaMemcpy2DAsync and use NvBufTransform to convert this memory to NVBUF_MEM_SURFACE_ARRAY worked well.(This is a logic what i attached at the very first)

But I’d like to directly create new GstBuffer with NVBUF_MEM_SURFACE_ARRAY NvBufSurface from cv::cuda::GpuMat without converting from NVBUF_MEM_CUDA_UNIFIED

DaneLLL · April 22, 2025, 11:59am

Hi,
It is not supported to map cv::cuda::GpuMat to NvBufSurface. You would need to allocate NvBufSurface first, map it to output_mat and then process the frame data to be outputted to output_mat.

rlaxodncjs · April 22, 2025, 12:50pm

Hi,
Now I’m trying to allocate NvBufSurface first and map it to cv::cuda::GpuMat and process the frame data to it.

NvBufSurfaceCreateParams create_params = {0};
create_params.gpuId = 0;
create_params.width = m_video_width;
create_params.height = m_video_height;
create_params.colorFormat = NVBUF_COLOR_FORMAT_NV12;
create_params.layout = NVBUF_LAYOUT_PITCH;
create_params.memType = NVBUF_MEM_SURFACE_ARRAY;

if (NvBufSurfaceCreate(&base_surf, 1, &create_params) != 0) {
    std::cerr << "NvBufSurface creation error" << std::endl;
}
base_surf->numFilled = 1;

if (NvBufSurfaceMapEglImage(surface, 0) != 0) {
          std::cerr << "Failed to map EGL image" << std::endl;
          return false;
      }

m_egl_image = surface->surfaceList[0].mappedAddr.eglImage;

if (cuGraphicsEGLRegisterImage(&m_cuda_resource, m_egl_image,
                                     CU_GRAPHICS_MAP_RESOURCE_FLAGS_NONE) != CUDA_SUCCESS) {
      std::cerr << "Failed to register CUDA resource" << std::endl;
      return false;
}

if (cuGraphicsResourceGetMappedEglFrame(&m_egl_frame, m_cuda_resource, 0, 0) !=
          CUDA_SUCCESS) {
      std::cerr << "Failed to get mapped EGL frame" << std::endl;
      return false;
}

// my input is NV12 and output should be NV12 as well.
cv::cuda::GpuMat d_y(params.height, params.width, CV_8UC1, m_egl_frame.frame.pPitch[0]);
cv::cuda::GpuMat d_uv(params.height / 2, params.width, CV_8UC1, m_egl_frame.frame.pPitch[1]);

// process image and make output to be saved in d_y, d_uv

But after i attach this to GstBuffer and pass it to the next element(encoder - rtmpsink) and when i check the video, video frame seems to be cracked.

DaneLLL · April 23, 2025, 2:24am

Hi,

Does it mean the buffer is updated partially, like not being synchronized? Or totally wrong and unexpected?

rlaxodncjs · April 23, 2025, 9:45pm

Hi This is my final code

GstBuffer *newGstOutBuf = NULL;
gst_buffer_pool_acquire_buffer(m_dsBufferPool, &newGstOutBuf, NULL);
NvBufSurface *out_surf = getNvBufSurface(newGstOutBuf);
out_surf->numFilled = 1;
out_surf->batchSize = 1;
// out_surf's color format is NVBUF_COLOR_FORMAT_NV12 and layout is NVBUF_LAYOUT_PITCH and memType is NVBUF_MEM_SURFACE_ARRAY
        
NvBufSurfaceMapEglImage(out_surf, 0);
m_egl_image = out_surf->surfaceList[0].mappedAddr.eglImage;
cuGraphicsEGLRegisterImage(&m_cuda_resource, m_egl_image,
                                               CU_GRAPHICS_MAP_RESOURCE_FLAGS_NONE);
cuGraphicsResourceGetMappedEglFrame(&m_egl_frame, m_cuda_resource, 0, 0);
cv::cuda::GpuMat d_y(m_video_height, m_video_width, CV_8UC1, m_egl_frame.frame.pPitch[0]);
cv::cuda::GpuMat d_uv(m_video_height / 2, m_video_width, CV_8UC1, m_egl_frame.frame.pPitch[1]);

// after processing vidoe, i was able to get output cv::cuda::GpuMat output_target_y, output_target_uv

output_target_y.copyTo(d_y);
output_target_uv.copyTo(d_uv);

// pass newGstOutBuf to downstream element

And this is the image from rtmpsink

DaneLLL · April 24, 2025, 1:12am

Hi,
We are not able to comment further from the partial code. Please share a full test sample running like:

nvarguscamerasrc ! nvdsvideotemplate ! tee ! nvv4l2h264enc ! h264parse ! matroskamux ! filesink

So that we can reproduce it on developer kit and check.

rlaxodncjs · April 24, 2025, 10:15am

Hi,
Okay I will share the pipeline. And for nvdsvideotemplate, We are implementing custom code. Shall i share the full code of customlib_impl.cpp?

DaneLLL · April 24, 2025, 11:24am

Hi,
Please have a simplified version which is only for replicating the issue.

Topic		Replies	Views
OpenCV Mat to NvBufSurface (to use in NvBufSurfTransform) DeepStream SDK	16	3577	October 12, 2021
Error generated while running the code after connecting the camera Jetson Xavier NX gstreamer , nvbugs	45	1258	January 2, 2024
Deepstream DeepStream SDK gstreamer , jetson , deepstream	14	461	July 9, 2024
Package the tensorRT inference results into GstBuffer and push them to the downstream DeepStream SDK	7	1303	October 17, 2022
Faster way to cache images on Jetson DeepStream SDK	8	405	June 21, 2023
How to NvBufSurface to cv::Mat? DeepStream SDK	12	469	July 2, 2024
nvivafilter customer-lib implementation (code available) Jetson AGX Xavier	7	1785	October 18, 2021
Calling nvds_obj_enc_finish SIGSEGV DeepStream SDK	11	973	December 6, 2022
dps4 on xavier: failed to register EGLImage in cuda DeepStream SDK	16	2076	October 12, 2021
Jetson Nano: Deepstream Plugin Memory Management for OpenVX DeepStream SDK cuda , gstreamer	6	1481	October 12, 2021

How can I create new NvBufSurface from cv::cuda::GpuMat with NVBUF_MEM_SURFACE_ARRAY

Related topics