How can I create new NvBufSurface from cv::cuda::GpuMat with NVBUF_MEM_SURFACE_ARRAY

• Hardware Platform (Jetson / GPU) Jetson
• DeepStream Version 6.3
• JetPack Version (valid for Jetson only) 5.1.2

Hello, I’m building custom element by using nvdsvideotemplate. In this element, I received a buffer from upstream. and do some processing using cv::cuda and I should make a new GstBuffer from this cv::cuda::GpuMat.
my output cv::cuda::GpuMat is output_nv12_mat

NvBufSurfaceCreateParams create_params = {0};
create_params.gpuId = 0;
create_params.width = m_video_width;
create_params.height = m_video_height;
create_params.colorFormat = NVBUF_COLOR_FORMAT_NV12;
create_params.layout = NVBUF_LAYOUT_PITCH;
create_params.memType = NVBUF_MEM_CUDA_UNIFIED;

new_surf->numFilled = 1;
        cudaMemcpy2DAsync(new_surf->surfaceList[0].dataPtr, new_surf->surfaceList[0].pitch,
                                  output_nv12_mat.data, output_nv12_mat.step, m_video_width,
                                  m_video_height, cudaMemcpyDeviceToDevice,
                                  m_config_params.cuda_stream);

        // Copy UV plane
        cudaMemcpy2DAsync(static_cast<uint8_t *>(new_surf->surfaceList[0].dataPtr) +
                                                                    new_surf->surfaceList[0].pitch * m_video_height,
                          new_surf->surfaceList[0].pitch,
                          output_nv12_mat.data + output_nv12_mat.step * m_video_height,
                          output_nv12_mat.step, m_video_width, m_video_height / 2,
                          cudaMemcpyDeviceToDevice, m_config_params.cuda_stream);
        cudaStreamSynchronize(m_config_params.cuda_stream);

NvBufSurfTransformSetSessionParams(&m_config_params);
GstBuffer *newGstOutBuf = NULL;
gst_buffer_pool_acquire_buffer(m_dsBufferPool, &newGstOutBuf, NULL);
NvBufSurface *out_surf = getNvBufSurface(newGstOutBuf);
out_surf->numFilled = new_surf->numFilled;
out_surf->batchSize = new_surf->batchSize;//
NvBufSurfTransformParams transform_params;
transform_params.transform_flag = NVBUFSURF_TRANSFORM_FILTER;
transform_params.transform_flip = NvBufSurfTransform_None;
transform_params.transform_filter = NvBufSurfTransformInter_Default;
NvBufSurfTransform(new_surf, out_surf, &transform_params);
NvBufSurfaceDestroy(new_surf);

This is working code, And after passing this buffer to encoder, I was able to save correct image.
But it seems to make a unnecessary memory copy because, I made NvBufSurface with NVBUF_MEM_CUDA_UNIFIED memory and convert this to NVBUF_MEM_SURFACE_ARRAY. And I set config params as below, So It seems to use GPU for this transform.

m_config_params.compute_mode = NvBufSurfTransformCompute_GPU;
m_config_params.gpu_id = params->m_gpuId;
m_config_params.cuda_stream = params->m_cudaStream;

So I tried to set NVBUF_MEM_SURFACE_ARRAY to create_params.memType and allocate output matrix to NvBufSurface by using below code.

m_egl_image = new_surf->surfaceList[0].mappedAddr.eglImage;

if (cuGraphicsEGLRegisterImage(&m_cuda_resource, m_egl_image,
                                       CU_GRAPHICS_MAP_RESOURCE_FLAGS_NONE) != CUDA_SUCCESS) {
        std::cerr << "Failed to register CUDA resource" << std::endl;
        return false;
}

if (cuGraphicsResourceGetMappedEglFrame(&m_egl_frame, m_cuda_resource, 0, 0) !=
            CUDA_SUCCESS) {
        std::cerr << "Failed to get mapped EGL frame" << std::endl;
        return false;
}

cv::cuda::GpuMat d_y(new_surf->surfaceList[0].height, new_surf->surfaceList[0].width, CV_8UC1, m_egl_frame.frame.pPitch[0]);
cv::cuda::GpuMat d_uv(new_surf->surfaceList[0].height / 2, new_surf->surfaceList[0].width, CV_8UC1, m_egl_frame.frame.pPitch[1]);

output_nv12_mat.copyTo(d_y);
output_nv12_map.copyTo(d_uv);

And When i parse cv::cuda::GpuMat again from this NvBufSurface and checked image, It looks okay. But after NvBufSurfTransform and attache GstBuffer and checked a video at the downstream, image was crashed.

So overall,

  • I’d like to avoid memory copy as much as possible. Can i directly make a GstBuffer from cv::cuda::GpuMat?

If you want to use the NVBUF_MEM_SURFACE_ARRAY type and cv::cuda::GpuMat, this inevitably leads to the process of memory copy on Jetson.
If you want to avoid memory copy, you should use the NVBUF_MEM_CUDA_UNIFIED type.

Thanks for the answer!

Actually, my piepeline is briefly as below
nvarguscamerasrc → nvdsvideotemplate → tee → nvv4l2h264enc → filesink
→ tee → nvdsvideotemplate → nvv4l2h264enc → filesink.

And in each nvdsvideotemplate I need to handle video buffer.
But after nvarguscamerasrc, buffer is already on NVBUF_MEM_SURFACE_ARRAY and also, from my understanding, nvv4l2h264enc should receive NVBUF_MEM_SURFACE_ARRAY buffer. so that’s why I tried to keep my memory on NVBUF_MEM_SURFACE_ARRAY.

So to sum up my questions,

  1. Is it possible to keep my all buffer in NVBUF_MEM_CUDA_UNIFIED for my pipeline setup?
  2. If I should use NVBUF_MEM_SURFACE_ARRAY to keep my pipeline, Is there any way i can access pixel data of NVBUF_MEM_SURFACE_ARRAY not using cv::cuda::GpuMat?
    I’m using CUDA kernel to modify buffer pixels,
    My CUDA kernel function is like this.
__global__ void screenToVideoKernel(int params,
                                    cv::cuda::PtrStepSz<uint8_t> frame,
                                    cv::cuda::PtrStepSz<uint8_t> output)
{
    int x = blockIdx.x * blockDim.x + threadIdx.x;
    int y = blockIdx.y * blockDim.y + threadIdx.y;

    int maxVideoHeight = videoHeight / length;

    if (x >= videoWidth || y >= maxVideoHeight)
    {
        return;
    }

  1. From my understanding, in Jetson, memory is shared physically between CPU and GPU and Jetson’s default memory is NVBUF_MEM_SURFACE_ARRAY. Then What does it mean when NVBUF_MEM_SURFACE_ARRAY and NVBUF_MEM_CUDA_UNIFIED are explicitly specified?

Hi,
For information, do you use Xavier or Orin? Would like to confirm which platform you are using.

Hello,
We are using Xavier

Hi,
You can access NvBufSurface through the function calls like:
How to create opencv gpumat from nvstream? - #18 by DaneLLL

NvBufSurfaceMapEglImage();
cuGraphicsEGLRegisterImage();
cuGraphicsResourceGetMappedEglFrame();

_do_image_process_;

cuGraphicsUnregisterResource();
NvBufSurfaceUnMapEglImage();

The nvarguscamerasrc sends NvBufsurface in NV12 blocklinear to next plugin by default. You can customize the source source to send NV12 pitch linear. The source code is in

Jetson Linux 35.4.1 | NVIDIA Developer
Driver Package (BSP) Sources

Hi Thanks for the answer.
I’m exactly doing like that

NvBufSurfaceMapEglImage();
cuGraphicsEGLRegisterImage();
cuGraphicsResourceGetMappedEglFrame();

_do_image_process_;
cv::cuda::GpuMat output_mat // This is my result output from image process.

cuGraphicsUnregisterResource();
NvBufSurfaceUnMapEglImage();

And with this output cv::cuda::GpuMat, I need to create a GstBuffer with NvBufSurface made on NVBUF_MEM_SURFACE_ARRAY to pass the data to downstream element.
NvBufSurfaceCreate with NVBUF_MEM_CUDA_UNIFIED and attach cv::cuda::GpuMat data to this by using cudaMemcpy2DAsync and use NvBufTransform to convert this memory to NVBUF_MEM_SURFACE_ARRAY worked well.(This is a logic what i attached at the very first)

But I’d like to directly create new GstBuffer with NVBUF_MEM_SURFACE_ARRAY NvBufSurface from cv::cuda::GpuMat without converting from NVBUF_MEM_CUDA_UNIFIED

Hi,
It is not supported to map cv::cuda::GpuMat to NvBufSurface. You would need to allocate NvBufSurface first, map it to output_mat and then process the frame data to be outputted to output_mat.

Hi,
Now I’m trying to allocate NvBufSurface first and map it to cv::cuda::GpuMat and process the frame data to it.

NvBufSurfaceCreateParams create_params = {0};
create_params.gpuId = 0;
create_params.width = m_video_width;
create_params.height = m_video_height;
create_params.colorFormat = NVBUF_COLOR_FORMAT_NV12;
create_params.layout = NVBUF_LAYOUT_PITCH;
create_params.memType = NVBUF_MEM_SURFACE_ARRAY;

if (NvBufSurfaceCreate(&base_surf, 1, &create_params) != 0) {
    std::cerr << "NvBufSurface creation error" << std::endl;
}
base_surf->numFilled = 1;

if (NvBufSurfaceMapEglImage(surface, 0) != 0) {
          std::cerr << "Failed to map EGL image" << std::endl;
          return false;
      }

m_egl_image = surface->surfaceList[0].mappedAddr.eglImage;

if (cuGraphicsEGLRegisterImage(&m_cuda_resource, m_egl_image,
                                     CU_GRAPHICS_MAP_RESOURCE_FLAGS_NONE) != CUDA_SUCCESS) {
      std::cerr << "Failed to register CUDA resource" << std::endl;
      return false;
}

if (cuGraphicsResourceGetMappedEglFrame(&m_egl_frame, m_cuda_resource, 0, 0) !=
          CUDA_SUCCESS) {
      std::cerr << "Failed to get mapped EGL frame" << std::endl;
      return false;
}

// my input is NV12 and output should be NV12 as well.
cv::cuda::GpuMat d_y(params.height, params.width, CV_8UC1, m_egl_frame.frame.pPitch[0]);
cv::cuda::GpuMat d_uv(params.height / 2, params.width, CV_8UC1, m_egl_frame.frame.pPitch[1]);

// process image and make output to be saved in d_y, d_uv

But after i attach this to GstBuffer and pass it to the next element(encoder - rtmpsink) and when i check the video, video frame seems to be cracked.

Hi,

Does it mean the buffer is updated partially, like not being synchronized? Or totally wrong and unexpected?

Hi This is my final code

GstBuffer *newGstOutBuf = NULL;
gst_buffer_pool_acquire_buffer(m_dsBufferPool, &newGstOutBuf, NULL);
NvBufSurface *out_surf = getNvBufSurface(newGstOutBuf);
out_surf->numFilled = 1;
out_surf->batchSize = 1;
// out_surf's color format is NVBUF_COLOR_FORMAT_NV12 and layout is NVBUF_LAYOUT_PITCH and memType is NVBUF_MEM_SURFACE_ARRAY
        
NvBufSurfaceMapEglImage(out_surf, 0);
m_egl_image = out_surf->surfaceList[0].mappedAddr.eglImage;
cuGraphicsEGLRegisterImage(&m_cuda_resource, m_egl_image,
                                               CU_GRAPHICS_MAP_RESOURCE_FLAGS_NONE);
cuGraphicsResourceGetMappedEglFrame(&m_egl_frame, m_cuda_resource, 0, 0);
cv::cuda::GpuMat d_y(m_video_height, m_video_width, CV_8UC1, m_egl_frame.frame.pPitch[0]);
cv::cuda::GpuMat d_uv(m_video_height / 2, m_video_width, CV_8UC1, m_egl_frame.frame.pPitch[1]);

// after processing vidoe, i was able to get output cv::cuda::GpuMat output_target_y, output_target_uv

output_target_y.copyTo(d_y);
output_target_uv.copyTo(d_uv);

// pass newGstOutBuf to downstream element

And this is the image from rtmpsink

Hi,
We are not able to comment further from the partial code. Please share a full test sample running like:

nvarguscamerasrc ! nvdsvideotemplate ! tee ! nvv4l2h264enc ! h264parse ! matroskamux ! filesink

So that we can reproduce it on developer kit and check.

Hi,
Okay I will share the pipeline. And for nvdsvideotemplate, We are implementing custom code. Shall i share the full code of customlib_impl.cpp?

Hi,
Please have a simplified version which is only for replicating the issue.