Deepstream

• Hardware Platform (Jetson / GPU) : Jetson
• DeepStream Version : DS6.2
• JetPack Version (valid for Jetson only) : JP5.1.2
• Issue Type( questions, new requirements, bugs) : Question

Hi,
I want to apply some cuda kenel based image processing that is transforming the input buffer and pushing to the downstream.
I’m using gst-nvdsvideotemplate as a base source.

Q1. does this make sense in the performance perspective?
I want to use the color format NV12 so that there won’t be any colour conversion in the whole pipeline.
ex) nvarguscamerasrc (input, NV12) → nvdsvideotemplate(custom implementation, NV12) → nvinfer(if neeeded, NV12) → nvv4l2h264enc (NV12)

Q2. the main problem is that NV12 format seems not properly working.
how can I use NV12 nvbufsurface to opencv format?
I used CV_U8C1 with size of width and height *3/2.

// gst-nvdsvideotemplate/customlib_impl
// in the SampleAlgorithm::ProcessBuffer(GstBuffer *inbuf)
    if (m_inVideoFmt == GST_VIDEO_FORMAT_RGBA)
    {
      // This is working fine, I can also use any custom cuda kernel
      cv::cuda::GpuMat d_input_rgba = cv::cuda::GpuMat(in_surf->surfaceList[i].height, in_surf->surfaceList[i].width, CV_8UC4, (unsigned char *)in_surf->surfaceList[i].dataPtr);
      d_input_rgba.setTo(cv::Scalar::all(0));
    }
    else if (m_inVideoFmt == GST_VIDEO_FORMAT_NV12)
    {
      // this does not wrap the data properly. only dark image.
      cv::cuda::GpuMat d_input_nv12 = cv::cuda::GpuMat(in_surf->surfaceList[i].height *3 / 2, in_surf->surfaceList[i].width, CV_8UC1, (unsigned char *)in_surf->surfaceList[i].dataPtr);
     
       // This gives error, I cannot use any other custom cuda kernel.
      d_input_nv12.setTo(cv::Scalar::all(0));
    }

the error message with NV12 is as below

terminate called after throwing an instance of 'cv::Exception'
  what():  OpenCV(4.5.4) /opt/nvidia/deepstream/deepstream-6.2/opencv-4.5.4/modules/core/src/cuda/gpu_mat.cu:389: error: (-217:Gpu API call) invalid argument in function 'setTo'

or, in case of custom kernel,

Caught SIGSEGV

Here is the gstreamer pipeline for your reference(only format=RGBA / NV12 difference)

RGBA → working fine

gst-launch-1.0 filesrc location= h264.mp4 ! qtdemux ! h264parse ! queue ! nvv4l2decoder ! m.sink_0 nvstreammux name=m width=3840 height=2160 batch-size=1 ! nvvideoconvert nvbuf-memory-type=1 compute-hw=1 ! 'video/x-raw(memory:NVMM), width=3840, height=2160' ! queue ! nvdsvideotemplate customlib-name="./customlib_impl/libcustom_videoimpl.so" customlib-props="scale-factor:2.0" ! nvvideoconvert ! 'video/x-raw(memory:NVMM), format=RGBA, width=3840, height=2160' ! nvv4l2h264enc ! ...

NV12 → error or seems there is no data in the ptr

gst-launch-1.0 filesrc location= h264.mp4 ! qtdemux ! h264parse ! queue ! nvv4l2decoder ! m.sink_0 nvstreammux name=m width=3840 height=2160 batch-size=1 ! nvvideoconvert nvbuf-memory-type=1 compute-hw=1 ! 'video/x-raw(memory:NVMM), width=3840, height=2160' ! queue ! nvdsvideotemplate customlib-name="./customlib_impl/libcustom_videoimpl.so" customlib-props="scale-factor:2.0" ! nvvideoconvert ! 'video/x-raw(memory:NVMM), format=NV12, width=3840, height=2160' ! nvv4l2h264enc ! ...

please refer to this sample for how to access nv12 by opencv.

Hi,
It seems a bit weird so I explained it as below.

1. Cpu sync and then mapping to cv::Mat is doen without memory error, however it seems it is different format, not NV12

input frame (W x H)

expected NV12 (raw dump, W x H+H/2)
this is converted from RGBA → NV12 conversion

the result from the code

//TODO for cuda device memory we need to use cudamemcpy
+      NvBufSurfaceMap (surface, -1, -1, NVBUF_MAP_READ);
+      /* Cache the mapped data for CPU access */
+      NvBufSurfaceSyncForCpu (surface, 0, 0); //will do nothing for unified memory type on dGPU
+      guint height = surface->surfaceList[frame_meta->batch_id].height;
+      guint width = surface->surfaceList[frame_meta->batch_id].width;
+
+      //Create Mat from NvMM memory, refer opencv API for how to create a Mat
+      Mat nv12_mat = Mat(height*3/2, width, CV_8UC1, surface->surfaceList[frame_meta->batch_id].mappedAddr.addr[0],
+      surface->surfaceList[frame_meta->batch_id].pitch);

to double check, I tried converting it to RGB

cv::cvtColor(yuv, bgr, cv::COLOR_YUV2BGR_NV12);

  1. in the comment there is (TODO for cuda device memory we need to use cudamemcpy) How can I directly use cv::cuda::GpuMat using cudamemcpy?

Kind Regards,

(Update!)
I found that RGBA uses NVBUF_LAYOUT_PITCH, and NV12 uses NVBUF_LAYOUT_BLOCK_LINEAR.
So I applied layout transformation as below

NvBufSurfaceCreateParams create_params;
      create_params.gpuId = 0; // Use GPU ID 0
      create_params.width = in_surf->surfaceList[i].width;
      create_params.height = in_surf->surfaceList[i].height;
      create_params.size = in_surf->surfaceList[i].width * in_surf->surfaceList[i].height * 3 / 2; // 0;
      create_params.colorFormat = in_surf->surfaceList[i].colorFormat;
      create_params.layout = NVBUF_LAYOUT_PITCH;
      create_params.memType = NVBUF_MEM_DEFAULT;

      NvBufSurface *dst_surface = nullptr;
      if (NvBufSurfaceCreate(&dst_surface, 1, &create_params) != 0)
      {
        std::cerr << "Failed to create destination NvBufSurface." << std::endl;
        // return cv::Mat();
      }

      // Set up transform configuration
      NvBufSurfTransformConfigParams transform_config_params;
      transform_config_params.compute_mode = NvBufSurfTransformCompute_Default;
      transform_config_params.gpu_id = 0; // Use GPU ID 0

      // Set the transformation configuration
      if (NvBufSurfTransformSetSessionParams(&transform_config_params) != NvBufSurfTransformError_Success)
      {
        std::cerr << "Error setting transform session parameters." << std::endl;
        NvBufSurfaceDestroy(dst_surface);
      }
      // Set up transform parameters
      NvBufSurfTransformParams transform_params;
      memset(&transform_params, 0, sizeof(transform_params));
      transform_params.transform_flag = NVBUFSURF_TRANSFORM_FILTER;
      transform_params.transform_filter = NvBufSurfTransformInter_Nearest;
      // Perform the transformation
      if (NvBufSurfTransform(in_surf, dst_surface, &transform_params) != NvBufSurfTransformError_Success)
      {
        std::cerr << "Error performing NvBufSurfTransform." << std::endl;
        NvBufSurfaceDestroy(dst_surface);
      }
      // Wrap the destination NvBufSurface in a cv::Mat
      NvBufSurfaceMap(dst_surface, 0, 0, NVBUF_MAP_READ);
      NvBufSurfaceSyncForCpu(dst_surface, 0, 0);
      cv::Mat nv12_mat(dst_surface->surfaceList[0].height * 3 / 2,
                       dst_surface->surfaceList[0].width,
                       CV_8UC1,
                       (unsigned char *)dst_surface->surfaceList[0].mappedAddr.addr[0], dst_surface->surfaceList[0].pitch);
      nv12_mat.step = dst_surface->surfaceList[i].width * sizeof(uchar);
      cv::imwrite("nv_step.jpeg", nv12_mat);

      // Unmap the destination surface
      NvBufSurfaceUnMap(dst_surface, 0, 0);

but still the UV region(bottom) looks bad.
is my NvBufSurfTransform wrong?

can you use the the sample in my first comment to reproduce this issue? In that sample, it did nv12 → rgba → rgba-> nv12 conversion. you can use opencv to dump software buffer to check or this method to dump hardware buffer to check.

Hi
yes, for example using /opt/nvidia/deepstream/deepstream/sources/apps/sample_apps/deepstream-test1

isn’t it also corrupting uv when you see the result below?

...
//Create Mat from NvMM memory, refer opencv API for how to create a Mat
      Mat nv12_mat = Mat(height*3/2, width, CV_8UC1, surface->surfaceList[frame_meta->batch_id].mappedAddr.addr[0],
      surface->surfaceList[frame_meta->batch_id].pitch);  

      //Convert nv12 to RGBA to apply algo based on RGBA
      Mat rgba_mat;
      cv::cvtColor(nv12_mat, rgba_mat, cv::COLOR_YUV2BGRA_NV12);
      Mat rgb_mat;
      cv::cvtColor(nv12_mat, rgb_mat, cv::COLOR_YUV2BGR_NV12);

      //only rotate the first 10 frames
      if(frame_number < 10){
        //dump the original NvbufSurface
        sprintf(file_name_nv12, "nvinfer_probe_rotate_stream%2d_%03d_nv12.jpg", frame_meta->source_id, frame_number);
        imwrite(file_name_nv12, nv12_mat);

        //dump the original NvbufSurface
        sprintf(file_name_rgb, "nvinfer_probe_rotate_stream%2d_%03d_rgb.jpg", frame_meta->source_id, frame_number);
        imwrite(file_name_rgb, rgb_mat);
...


        // access the surface modified by opencv
        cv::cvtColor(nv12_mat, rgba_mat, cv::COLOR_YUV2BGRA_NV12);
        //dump the original NvbufSurface
        sprintf(file_name, "nvinfer_probe_rotate_stream%2d_%03d.jpg", frame_meta->source_id, frame_number);
        imwrite(file_name, rgba_mat);
        NvBufSurfaceUnMap(inter_buf, 0, 0);
      }


nv12 raw dump

rgb dump

rotated rgba dump

Thanks for the sharing! I will check.

1 Like

from “nv12 raw dump” you shared, the uv part is empty. Here is the test output on DGPU1.zip (406.2 KB). so there are some issues when using opencv cv:Mat mapping for nv12 nvbufsurface on Jetson.
Here is a workaround.

  1. use method on Jun 12 to dump the nv12 nvbufsurface to software memory. the size will become wxhx3/2 because of removing the padding.
  2. use “Mat nv12_mat = Mat(height*3/2, width, CV_8UC1, nv12_data);” to map again.

Hi,

Sorry but could you elaborate more?
Maybe you mean I need to do something in the comment region below, but I don’t understand well.

Kind Regards,

...
NvBufSurface *surface = (NvBufSurface *)in_map_info.data;
  for (l_frame = batch_meta->frame_meta_list; l_frame != NULL;
    l_frame = l_frame->next) {
      NvDsFrameMeta *frame_meta = (NvDsFrameMeta *) (l_frame->data);
      //TODO for cuda device memory we need to use cudamemcpy
      NvBufSurfaceMap (surface, -1, -1, NVBUF_MAP_READ);
      /* Cache the mapped data for CPU access */
      NvBufSurfaceSyncForCpu (surface, 0, 0); //will do nothing for unified memory type on dGPU
      guint height = surface->surfaceList[frame_meta->batch_id].height;
      guint width = surface->surfaceList[frame_meta->batch_id].width;

      //Create Mat from NvMM memory, refer opencv API for how to create a Mat
      Mat nv12_mat = Mat(height*3/2, width, CV_8UC1, surface->surfaceList[frame_meta->batch_id].mappedAddr.addr[0],
      surface->surfaceList[frame_meta->batch_id].pitch);  

//////////////////////////////////////////////////////////
///////////WHAT TO UPDATE HERE??//////////////////////////
//////////////////////////////////////////////////////////

      //only rotate the first 10 frames
      if(frame_number < 10){
        //dump the original NvbufSurface
        sprintf(file_name_nv12, "nvinfer_probe_rotate_stream%2d_%03d_nv12.jpg", frame_meta->source_id, frame_number);
        imwrite(file_name_nv12, nv12_mat);

you can dump nv12 hardware buffer to software buffer nv12_data first. please refer to the following code:

  char* nv12_data = NULL; 
......
      guint width = surface->surfaceList[frame_meta->batch_id].width;

      if(nv12_data == NULL){
        nv12_data = (char*) malloc(width * height * 3/2);
      }
      char* pnv12 = nv12_data;
      NvBufSurfaceMappedAddr *mAddr = &(surface->surfaceList[0].mappedAddr);
      NvBufSurfacePlaneParams *planeParams = &(surface->surfaceList[frame_meta->batch_id].planeParams);
      int bs = 0;
      for (int p = 0; p < planeParams->num_planes; p++)
      {
        size_t bytes_to_write = planeParams->bytesPerPix[p] * planeParams->width[p];
        char *data = (char *) mAddr->addr[p];
        for (int j = 0; j < planeParams->height[p]; j++)
        {
          outputfile.write(data, bytes_to_write);
          memcpy(pnv12, data, bytes_to_write);
          data += planeParams->pitch[p];
          pnv12 += bytes_to_write;
        }
      }
      //Create Mat from NvMM memory, refer opencv API for how to create a Mat
      Mat nv12_mat = Mat(height*3/2, width, CV_8UC1, nv12_data);
      //only rotate the first 10 frames
......

Hi,

now understand that it’s in multiple planes.

alternatively I could do this.

Mat nv12_mat_y = Mat(height, width, CV_8UC1, (char *)surface->surfaceList[0].mappedAddr.addr[0],
      surface->surfaceList[frame_meta->batch_id].pitch);
      Mat nv12_mat_uv = Mat(height/2, width, CV_8UC1, (char *)surface->surfaceList[0].mappedAddr.addr[1],
      surface->surfaceList[frame_meta->batch_id].pitch);
      Mat nv12_mat = Mat(height*3/2, width, CV_8UC1);
      nv12_mat_y.copyTo(nv12_mat(cv::Rect(0,0,width,height)));
      nv12_mat_uv.copyTo(nv12_mat(cv::Rect(0,height,width,height/2)));

shouldn’t it be working directly to the CUDA memory or cv::cuda::GpuMat?
Let me also try it.

Regards,

Thanks for the sharing! On Jetson, Y and UV of nv12 are not consecutive. it seems opencv can’t cover this case directly.

Thank you again,

Lastly,
based on the comment in the example code,
I tried to wrap it into cuda device prt.

but it gives only black frame,
could you give me some advise here?

//TODO for cuda device memory we need to use cudamemcpy
      size_t pitch = surface->surfaceList[frame_meta->batch_id].pitch;
      size_t nv12_size = pitch * (height + height / 2);
      
      char* d_y_data = (char*)surface->surfaceList[frame_meta->batch_id].mappedAddr.addr[0];
      char* d_uv_data = (char*)surface->surfaceList[frame_meta->batch_id].mappedAddr.addr[1];
      uchar* d_nv12_data;
      cudaMalloc(&d_nv12_data, nv12_size);
      cudaMemcpy2D(d_nv12_data, pitch, d_y_data, pitch, width, height, cudaMemcpyDeviceToDevice);
      cudaMemcpy2D(d_nv12_data + pitch * height, pitch, d_uv_data, pitch, width, height / 2, cudaMemcpyDeviceToDevice);
      cuda::GpuMat concatenated_nv12(height + height / 2, width, CV_8UC1, d_nv12_data, pitch);
      Mat nv12_test_host;
      concatenated_nv12.download(nv12_test_host);

      sprintf(file_name_nv12, "nvinfer_probe_rotate_stream%2d_%03d_nv12_device.jpg", frame_meta->source_id, frame_number);
      imwrite(file_name_nv12, nv12_test_host);    

Kind Regards,

after cudaMemcpy2D, you can dump hardware buffer to check if the data is valid.
please refer to this topic for NV12 format to GpuMat. please refer to this topic.