Convert OpenCV Mat to NvBufSurface

Hello everyone,

I’m creating a preprocessing element that performs some opencv operations on the image, and this element will come just before nvinfer. The logic of the element is as follows:

  1. Obtain a cv::Mat from NvBufSurface
  2. Perform opencv operations
  3. Create a new NvBufSurface
  4. Populate the new surface with the data from the cv::Mat
  5. Unref the input buffer

The reason I’m discarding the original buffer is because I need to send the original frames along with a few copies of the frame, with some filters applied.

There are examples on how to convert NvBufSurface to cv::Mat (present in dsexample.cpp), but no examples on how to do the reverse, i.e., cv::Mat to NvBufSurface. I have looked at a few topics on the forum that discuss similar issues, like this one and this one, and also looked at the nvbufsurface.h header, and managed to put together this code:

// creating NvBufSurface
NvBufSurfaceCreateParams create_params;
NvBufSurface* surface = NULL;
create_params.gpuId = gpu_id;
create_params.width = width;
create_params.height = height;
create_params.size = 0;
create_params.colorFormat = NVBUF_COLOR_FORMAT_RGBA;
create_params.layout = NVBUF_LAYOUT_PITCH;
#ifdef __aarch64__
  create_params.memType = NVBUF_MEM_DEFAULT;
  create_params.memType = NVBUF_MEM_CUDA_UNIFIED;
NvBufSurfaceCreate(&surface, 1, &create_params);

output = gst_buffer_new_wrapped_full((GstMemoryFlags) 0, (gpointer) surface, sizeof(NvBufSurface), 0, sizeof(NvBufSurface), NULL, NULL);

// adding NvDsBatchMeta to new buffer
NvDsBatchMeta* batch_meta = nvds_create_batch_meta(1);
NvDsMeta* meta = gst_buffer_add_nvds_meta(*output, batch_meta, NULL, nvds_batch_meta_copy_func, nvds_batch_meta_release_func);
meta->meta_type = NVDS_BATCH_GST_META;
batch_meta->base_meta.batch_meta = batch_meta;
batch_meta->base_meta.copy_func = nvds_batch_meta_copy_func;
batch_meta->base_meta.release_func = nvds_batch_meta_release_func;
batch_meta->max_frames_in_batch = 1;

// copy logic:
GstMapInfo in_map_info;
GstMapInfo out_map_info;
NvBufSurface* in_surface;
NvBufSurface* out_surface;
NvDsBatchMeta* out_batch_meta;
NvDsBatchMeta* in_batch_meta;
NvDsFrameMeta* in_frame_meta;

gst_buffer_map(inbuf, &in_map_info, GST_MAP_READ);
gst_buffer_map(outbuf, &out_map_info, GST_MAP_WRITE));

in_surface = (NvBufSurface*);
out_surface = (NvBufSurface*);
out_batch_meta = gst_buffer_get_nvds_batch_meta(outbuf);
in_batch_meta = gst_buffer_get_nvds_batch_meta(inbuf);
in_frame_meta = nvds_get_nth_frame_meta(in_batch_meta->frame_meta_list, 0);
std::unique_ptr<cv::Mat> cv_img = self->image_converter->get_mat_from_surface(in_surface, 0);

// ...perform opencv operations

// memset memory
if (NvBufSurfaceMemSet(out_surface, 0, 0, 0) != 0) {
  GST_ELEMENT_ERROR(self, STREAM, FAILED, ("Failed memset NvBufSurface"), (NULL));
  return GST_FLOW_ERROR;

// map buffer, since we are using one of NVBUF_MEM_CUDA_UNIFIED, NVBUF_MEM_SURFACE_ARRAY or NVBUF_MEM_HANDLE 
if (NvBufSurfaceMap(out_surface, 0, 0, NVBUF_MAP_WRITE) != 0) {
  GST_ELEMENT_ERROR(self, STREAM, FAILED, ("Failed to map output NvBufSurface"), (NULL));
  return GST_FLOW_ERROR;

// sync for CPU if on jetson
if (out_surface->memType == NVBUF_MEM_SURFACE_ARRAY || out_surface->memType == NVBUF_MEM_HANDLE) {
  NvBufSurfaceSyncForCpu(out_surface, 0, 0);

// convert cv::Mat to RGBA
cv::cvtColor(*cv_img, *cv_img, cv::COLOR_BGR2RGBA);

// copy data
memcpy(out_surface->surfaceList[0].mappedAddr.addr[0], cv_img->ptr(), cv_img->total() * cv_img->elemSize());
out_surface->numFilled = 1;

// sync for device if on jetson
if (out_surface->memType == NVBUF_MEM_SURFACE_ARRAY || out_surface->memType == NVBUF_MEM_HANDLE) {
  NvBufSurfaceSyncForDevice(out_surface, 0, 0);

// unmap surface
if (NvBufSurfaceUnMap(out_surface, 0, 0) != 0) {
  GST_ELEMENT_ERROR(self, STREAM, FAILED, ("Failed to unmap output NvBufSurface"), (NULL));
  return GST_FLOW_ERROR;

// adding NvDsFrameMeta to the batch meta
NvDsFrameMeta* frame_meta = nvds_acquire_frame_meta_from_pool(out_batch_meta);
nvds_add_frame_meta_to_batch(out_batch_meta, frame_meta);

frame_meta->pad_index = 0;
frame_meta->source_id = 0;
frame_meta->buf_pts = 0;
frame_meta->ntp_timestamp = 0;
frame_meta->frame_num = 0;
frame_meta->batch_id = 0;
frame_meta->source_frame_width = 640;
frame_meta->source_frame_height = 480;
frame_meta->num_surfaces_per_frame = 1;

But this does not work. It crashes immediately, with this stack trace:

Error: signal 11:

It looks like something in is crashing. Also, I have an element that saves the input image, and this element comes just after nvinfer. It gets triggered, but the input image which it saves looks corrupted. So I have no idea whats going on.

I would appreciate some guidance on this issue.

Information regarding my setup:

• Hardware Platform (Jetson / GPU): GPU
• DeepStream Version: 5.1
• TensorRT Version: 7.2.2-1
• NVIDIA GPU Driver Version (valid for GPU only): 495.46
• Issue Type( questions, new requirements, bugs): Question

Ok managed to get it to work, there were couple things I had to do:

  1. Make sure the correct color_format is being used (turns out I was accidentally setting the color format of the new NvBufSurface to NV12 instead of RGBA)
  2. Make sure a GDestroyNotify function in gst_buffer_new_wrapped_full is set, which will destroy the surface, to avoid a memory leak.

When I was debugging this, I came across something called “plane” in nvbufsurface.h. I assumed it was similar to channels, but it turns out it is not, since there was only one plane containing the RGBA frame in the input surface. Can anyone elaborate more on what a “plane” is?

There are reference code in


Yo can map RGBA NvBufSurface to cv::Mat by

  /* Use openCV to remove padding and convert RGBA to BGR. Can be skipped if
   * algorithm can handle padded RGBA data. */
  in_mat =
      cv::Mat (dsexample->processing_height, dsexample->processing_width,
      CV_8UC4, dsexample->inter_buf->surfaceList[0].mappedAddr.addr[0],

Some formats such as NV12 or YUV420 have multiple planes. Take YUV420 as an exampl, it has three planes: Y, U, and V plane. RGBA only has single plane.

Hello DaneLLL,
Thanks for the references. Regarding my other query, shouldn’t RGBA image have 4 planes? I’ve read from multiple sources that RGB images have three planes, the red plane, the green plane and the blue plane. Quoting from the Handbook of Machine Vision from Alexander Hornberg:

…The number of planes in an image corresponds to the number of arrays of pixels that compose the image. A grayscale or pseudo-color image is composed of one plane, while a true-color image is composed of three planes - one each for the red, blue and green components…A color image is the combination of three arrays of pixels corresponding to the red, green and blue components in an RGB image.

In this way, shouldn’t RGBA images have 4 planes instead? One for red, one for green, one for blue and one for alpha?

RGBA NvBufSurface is single plane in this order:

R 8-bit G 8-bit B 8-bit  A 8-bit R 8-bit G 8-bit B 8-bit  A 8-bit ...

In nvinfer plugin, the buffer is re-sampled to B, G, R planes through CUDA code, and then the planes are fed into TensorRT engine for doing inference. The B, G, R planes only reside in nvinfer plugins. After inferencing, RGBA NvBufSurface is passed to next element with metadata.

The nvinfer plugin is public and please check


Hey DaneLLL,
Thanks for your response, things make more sense now.
I had another look at the nvbufsurface.h, and saw that NVBUF_MAX_PLANES is 4. Isn’t that perfect number of planes to store RGBA data? So why is RGBA NvBufSurface stored in a single plane?

NvBufSuface is hardware DMA buffer so the data layout fits the requirement of hardware engines. For RGBA it is single plane in RGBA order. The maximum number is defined for other formats such as YUV420, which has 3 planes(Y, U, V plane).

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.