Deepstream

jahwan.oh · June 10, 2024, 2:40pm

• Hardware Platform (Jetson / GPU) : Jetson
• DeepStream Version : DS6.2
• JetPack Version (valid for Jetson only) : JP5.1.2
• Issue Type( questions, new requirements, bugs) : Question

Hi,
I want to apply some cuda kenel based image processing that is transforming the input buffer and pushing to the downstream.
I’m using gst-nvdsvideotemplate as a base source.

Q1. does this make sense in the performance perspective?
I want to use the color format NV12 so that there won’t be any colour conversion in the whole pipeline.
ex) nvarguscamerasrc (input, NV12) → nvdsvideotemplate(custom implementation, NV12) → nvinfer(if neeeded, NV12) → nvv4l2h264enc (NV12)

Q2. the main problem is that NV12 format seems not properly working.
how can I use NV12 nvbufsurface to opencv format?
I used CV_U8C1 with size of width and height *3/2.

// gst-nvdsvideotemplate/customlib_impl
// in the SampleAlgorithm::ProcessBuffer(GstBuffer *inbuf)
    if (m_inVideoFmt == GST_VIDEO_FORMAT_RGBA)
    {
      // This is working fine, I can also use any custom cuda kernel
      cv::cuda::GpuMat d_input_rgba = cv::cuda::GpuMat(in_surf->surfaceList[i].height, in_surf->surfaceList[i].width, CV_8UC4, (unsigned char *)in_surf->surfaceList[i].dataPtr);
      d_input_rgba.setTo(cv::Scalar::all(0));
    }
    else if (m_inVideoFmt == GST_VIDEO_FORMAT_NV12)
    {
      // this does not wrap the data properly. only dark image.
      cv::cuda::GpuMat d_input_nv12 = cv::cuda::GpuMat(in_surf->surfaceList[i].height *3 / 2, in_surf->surfaceList[i].width, CV_8UC1, (unsigned char *)in_surf->surfaceList[i].dataPtr);
     
       // This gives error, I cannot use any other custom cuda kernel.
      d_input_nv12.setTo(cv::Scalar::all(0));
    }

the error message with NV12 is as below

terminate called after throwing an instance of 'cv::Exception'
  what():  OpenCV(4.5.4) /opt/nvidia/deepstream/deepstream-6.2/opencv-4.5.4/modules/core/src/cuda/gpu_mat.cu:389: error: (-217:Gpu API call) invalid argument in function 'setTo'

or, in case of custom kernel,

Caught SIGSEGV

Here is the gstreamer pipeline for your reference(only format=RGBA / NV12 difference)

RGBA → working fine

gst-launch-1.0 filesrc location= h264.mp4 ! qtdemux ! h264parse ! queue ! nvv4l2decoder ! m.sink_0 nvstreammux name=m width=3840 height=2160 batch-size=1 ! nvvideoconvert nvbuf-memory-type=1 compute-hw=1 ! 'video/x-raw(memory:NVMM), width=3840, height=2160' ! queue ! nvdsvideotemplate customlib-name="./customlib_impl/libcustom_videoimpl.so" customlib-props="scale-factor:2.0" ! nvvideoconvert ! 'video/x-raw(memory:NVMM), format=RGBA, width=3840, height=2160' ! nvv4l2h264enc ! ...

NV12 → error or seems there is no data in the ptr

gst-launch-1.0 filesrc location= h264.mp4 ! qtdemux ! h264parse ! queue ! nvv4l2decoder ! m.sink_0 nvstreammux name=m width=3840 height=2160 batch-size=1 ! nvvideoconvert nvbuf-memory-type=1 compute-hw=1 ! 'video/x-raw(memory:NVMM), width=3840, height=2160' ! queue ! nvdsvideotemplate customlib-name="./customlib_impl/libcustom_videoimpl.so" customlib-props="scale-factor:2.0" ! nvvideoconvert ! 'video/x-raw(memory:NVMM), format=NV12, width=3840, height=2160' ! nvv4l2h264enc ! ...

fanzh · June 11, 2024, 2:52am

please refer to this sample for how to access nv12 by opencv.

jahwan.oh · June 11, 2024, 11:13am

Hi,
It seems a bit weird so I explained it as below.

1. Cpu sync and then mapping to cv::Mat is doen without memory error, however it seems it is different format, not NV12

input frame (W x H)

expected NV12 (raw dump, W x H+H/2)
this is converted from RGBA → NV12 conversion

the result from the code

//TODO for cuda device memory we need to use cudamemcpy
+      NvBufSurfaceMap (surface, -1, -1, NVBUF_MAP_READ);
+      /* Cache the mapped data for CPU access */
+      NvBufSurfaceSyncForCpu (surface, 0, 0); //will do nothing for unified memory type on dGPU
+      guint height = surface->surfaceList[frame_meta->batch_id].height;
+      guint width = surface->surfaceList[frame_meta->batch_id].width;
+
+      //Create Mat from NvMM memory, refer opencv API for how to create a Mat
+      Mat nv12_mat = Mat(height*3/2, width, CV_8UC1, surface->surfaceList[frame_meta->batch_id].mappedAddr.addr[0],
+      surface->surfaceList[frame_meta->batch_id].pitch);

to double check, I tried converting it to RGB

cv::cvtColor(yuv, bgr, cv::COLOR_YUV2BGR_NV12);

in the comment there is (TODO for cuda device memory we need to use cudamemcpy) How can I directly use cv::cuda::GpuMat using cudamemcpy?

Kind Regards,

…

(Update!)
I found that RGBA uses NVBUF_LAYOUT_PITCH, and NV12 uses NVBUF_LAYOUT_BLOCK_LINEAR.
So I applied layout transformation as below

NvBufSurfaceCreateParams create_params;
      create_params.gpuId = 0; // Use GPU ID 0
      create_params.width = in_surf->surfaceList[i].width;
      create_params.height = in_surf->surfaceList[i].height;
      create_params.size = in_surf->surfaceList[i].width * in_surf->surfaceList[i].height * 3 / 2; // 0;
      create_params.colorFormat = in_surf->surfaceList[i].colorFormat;
      create_params.layout = NVBUF_LAYOUT_PITCH;
      create_params.memType = NVBUF_MEM_DEFAULT;

      NvBufSurface *dst_surface = nullptr;
      if (NvBufSurfaceCreate(&dst_surface, 1, &create_params) != 0)
      {
        std::cerr << "Failed to create destination NvBufSurface." << std::endl;
        // return cv::Mat();
      }

      // Set up transform configuration
      NvBufSurfTransformConfigParams transform_config_params;
      transform_config_params.compute_mode = NvBufSurfTransformCompute_Default;
      transform_config_params.gpu_id = 0; // Use GPU ID 0

      // Set the transformation configuration
      if (NvBufSurfTransformSetSessionParams(&transform_config_params) != NvBufSurfTransformError_Success)
      {
        std::cerr << "Error setting transform session parameters." << std::endl;
        NvBufSurfaceDestroy(dst_surface);
      }
      // Set up transform parameters
      NvBufSurfTransformParams transform_params;
      memset(&transform_params, 0, sizeof(transform_params));
      transform_params.transform_flag = NVBUFSURF_TRANSFORM_FILTER;
      transform_params.transform_filter = NvBufSurfTransformInter_Nearest;
      // Perform the transformation
      if (NvBufSurfTransform(in_surf, dst_surface, &transform_params) != NvBufSurfTransformError_Success)
      {
        std::cerr << "Error performing NvBufSurfTransform." << std::endl;
        NvBufSurfaceDestroy(dst_surface);
      }
      // Wrap the destination NvBufSurface in a cv::Mat
      NvBufSurfaceMap(dst_surface, 0, 0, NVBUF_MAP_READ);
      NvBufSurfaceSyncForCpu(dst_surface, 0, 0);
      cv::Mat nv12_mat(dst_surface->surfaceList[0].height * 3 / 2,
                       dst_surface->surfaceList[0].width,
                       CV_8UC1,
                       (unsigned char *)dst_surface->surfaceList[0].mappedAddr.addr[0], dst_surface->surfaceList[0].pitch);
      nv12_mat.step = dst_surface->surfaceList[i].width * sizeof(uchar);
      cv::imwrite("nv_step.jpeg", nv12_mat);

      // Unmap the destination surface
      NvBufSurfaceUnMap(dst_surface, 0, 0);

but still the UV region(bottom) looks bad.
is my NvBufSurfTransform wrong?

fanzh · June 12, 2024, 9:31am

can you use the the sample in my first comment to reproduce this issue? In that sample, it did nv12 → rgba → rgba-> nv12 conversion. you can use opencv to dump software buffer to check or this method to dump hardware buffer to check.

jahwan.oh · June 12, 2024, 12:02pm

Hi
yes, for example using /opt/nvidia/deepstream/deepstream/sources/apps/sample_apps/deepstream-test1

isn’t it also corrupting uv when you see the result below?

...
//Create Mat from NvMM memory, refer opencv API for how to create a Mat
      Mat nv12_mat = Mat(height*3/2, width, CV_8UC1, surface->surfaceList[frame_meta->batch_id].mappedAddr.addr[0],
      surface->surfaceList[frame_meta->batch_id].pitch);  

      //Convert nv12 to RGBA to apply algo based on RGBA
      Mat rgba_mat;
      cv::cvtColor(nv12_mat, rgba_mat, cv::COLOR_YUV2BGRA_NV12);
      Mat rgb_mat;
      cv::cvtColor(nv12_mat, rgb_mat, cv::COLOR_YUV2BGR_NV12);

      //only rotate the first 10 frames
      if(frame_number < 10){
        //dump the original NvbufSurface
        sprintf(file_name_nv12, "nvinfer_probe_rotate_stream%2d_%03d_nv12.jpg", frame_meta->source_id, frame_number);
        imwrite(file_name_nv12, nv12_mat);

        //dump the original NvbufSurface
        sprintf(file_name_rgb, "nvinfer_probe_rotate_stream%2d_%03d_rgb.jpg", frame_meta->source_id, frame_number);
        imwrite(file_name_rgb, rgb_mat);
...


        // access the surface modified by opencv
        cv::cvtColor(nv12_mat, rgba_mat, cv::COLOR_YUV2BGRA_NV12);
        //dump the original NvbufSurface
        sprintf(file_name, "nvinfer_probe_rotate_stream%2d_%03d.jpg", frame_meta->source_id, frame_number);
        imwrite(file_name, rgba_mat);
        NvBufSurfaceUnMap(inter_buf, 0, 0);
      }

nv12 raw dump

rgb dump

rotated rgba dump

fanzh · June 13, 2024, 8:35am

Thanks for the sharing! I will check.

fanzh · June 14, 2024, 5:22am

from “nv12 raw dump” you shared, the uv part is empty. Here is the test output on DGPU1.zip (406.2 KB). so there are some issues when using opencv cv:Mat mapping for nv12 nvbufsurface on Jetson.
Here is a workaround.

use method on Jun 12 to dump the nv12 nvbufsurface to software memory. the size will become wxhx3/2 because of removing the padding.
use “Mat nv12_mat = Mat(height*3/2, width, CV_8UC1, nv12_data);” to map again.

jahwan.oh · June 14, 2024, 10:49am

Hi,

Sorry but could you elaborate more?
Maybe you mean I need to do something in the comment region below, but I don’t understand well.

Kind Regards,

...
NvBufSurface *surface = (NvBufSurface *)in_map_info.data;
  for (l_frame = batch_meta->frame_meta_list; l_frame != NULL;
    l_frame = l_frame->next) {
      NvDsFrameMeta *frame_meta = (NvDsFrameMeta *) (l_frame->data);
      //TODO for cuda device memory we need to use cudamemcpy
      NvBufSurfaceMap (surface, -1, -1, NVBUF_MAP_READ);
      /* Cache the mapped data for CPU access */
      NvBufSurfaceSyncForCpu (surface, 0, 0); //will do nothing for unified memory type on dGPU
      guint height = surface->surfaceList[frame_meta->batch_id].height;
      guint width = surface->surfaceList[frame_meta->batch_id].width;

      //Create Mat from NvMM memory, refer opencv API for how to create a Mat
      Mat nv12_mat = Mat(height*3/2, width, CV_8UC1, surface->surfaceList[frame_meta->batch_id].mappedAddr.addr[0],
      surface->surfaceList[frame_meta->batch_id].pitch);  

//////////////////////////////////////////////////////////
///////////WHAT TO UPDATE HERE??//////////////////////////
//////////////////////////////////////////////////////////

      //only rotate the first 10 frames
      if(frame_number < 10){
        //dump the original NvbufSurface
        sprintf(file_name_nv12, "nvinfer_probe_rotate_stream%2d_%03d_nv12.jpg", frame_meta->source_id, frame_number);
        imwrite(file_name_nv12, nv12_mat);

fanzh · June 14, 2024, 1:06pm

you can dump nv12 hardware buffer to software buffer nv12_data first. please refer to the following code:

  char* nv12_data = NULL; 
......
      guint width = surface->surfaceList[frame_meta->batch_id].width;

      if(nv12_data == NULL){
        nv12_data = (char*) malloc(width * height * 3/2);
      }
      char* pnv12 = nv12_data;
      NvBufSurfaceMappedAddr *mAddr = &(surface->surfaceList[0].mappedAddr);
      NvBufSurfacePlaneParams *planeParams = &(surface->surfaceList[frame_meta->batch_id].planeParams);
      int bs = 0;
      for (int p = 0; p < planeParams->num_planes; p++)
      {
        size_t bytes_to_write = planeParams->bytesPerPix[p] * planeParams->width[p];
        char *data = (char *) mAddr->addr[p];
        for (int j = 0; j < planeParams->height[p]; j++)
        {
          outputfile.write(data, bytes_to_write);
          memcpy(pnv12, data, bytes_to_write);
          data += planeParams->pitch[p];
          pnv12 += bytes_to_write;
        }
      }
      //Create Mat from NvMM memory, refer opencv API for how to create a Mat
      Mat nv12_mat = Mat(height*3/2, width, CV_8UC1, nv12_data);
      //only rotate the first 10 frames
......

jahwan.oh · June 14, 2024, 2:45pm

Hi,

now understand that it’s in multiple planes.

alternatively I could do this.

Mat nv12_mat_y = Mat(height, width, CV_8UC1, (char *)surface->surfaceList[0].mappedAddr.addr[0],
      surface->surfaceList[frame_meta->batch_id].pitch);
      Mat nv12_mat_uv = Mat(height/2, width, CV_8UC1, (char *)surface->surfaceList[0].mappedAddr.addr[1],
      surface->surfaceList[frame_meta->batch_id].pitch);
      Mat nv12_mat = Mat(height*3/2, width, CV_8UC1);
      nv12_mat_y.copyTo(nv12_mat(cv::Rect(0,0,width,height)));
      nv12_mat_uv.copyTo(nv12_mat(cv::Rect(0,height,width,height/2)));

shouldn’t it be working directly to the CUDA memory or cv::cuda::GpuMat?
Let me also try it.

Regards,

fanzh · June 14, 2024, 2:59pm

Thanks for the sharing! On Jetson, Y and UV of nv12 are not consecutive. it seems opencv can’t cover this case directly.

jahwan.oh · June 14, 2024, 4:10pm

Thank you again,

Lastly,
based on the comment in the example code,
I tried to wrap it into cuda device prt.

but it gives only black frame,
could you give me some advise here?

//TODO for cuda device memory we need to use cudamemcpy
      size_t pitch = surface->surfaceList[frame_meta->batch_id].pitch;
      size_t nv12_size = pitch * (height + height / 2);
      
      char* d_y_data = (char*)surface->surfaceList[frame_meta->batch_id].mappedAddr.addr[0];
      char* d_uv_data = (char*)surface->surfaceList[frame_meta->batch_id].mappedAddr.addr[1];
      uchar* d_nv12_data;
      cudaMalloc(&d_nv12_data, nv12_size);
      cudaMemcpy2D(d_nv12_data, pitch, d_y_data, pitch, width, height, cudaMemcpyDeviceToDevice);
      cudaMemcpy2D(d_nv12_data + pitch * height, pitch, d_uv_data, pitch, width, height / 2, cudaMemcpyDeviceToDevice);
      cuda::GpuMat concatenated_nv12(height + height / 2, width, CV_8UC1, d_nv12_data, pitch);
      Mat nv12_test_host;
      concatenated_nv12.download(nv12_test_host);

      sprintf(file_name_nv12, "nvinfer_probe_rotate_stream%2d_%03d_nv12_device.jpg", frame_meta->source_id, frame_number);
      imwrite(file_name_nv12, nv12_test_host);

Kind Regards,

fanzh · June 15, 2024, 3:37am

after cudaMemcpy2D, you can dump hardware buffer to check if the data is valid.
please refer to this topic for NV12 format to GpuMat. please refer to this topic.

yingliu · July 9, 2024, 2:29am

There is no update from you for a period, assuming this is not an issue anymore. Hence we are closing this topic. If need further support, please open a new one. Thanks

system · July 23, 2024, 2:30am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
dps4 on xavier: failed to register EGLImage in cuda DeepStream SDK	16	2110	October 12, 2021
Getting strange cv::Mat from NvBufSurface DeepStream SDK	10	873	July 26, 2022
From nvBufSurface to gpu::Mat or cuda stream DeepStream SDK cuda , deepstream	15	133	February 6, 2025
OpenCV Mat to NvBufSurface (to use in NvBufSurfTransform) DeepStream SDK	16	3631	October 12, 2021
How to NvBufSurface to cv::Mat? DeepStream SDK	12	526	July 2, 2024
How to convert NV12_709/NV12_709_ER frame to cv::MAT on Jetson NX? DeepStream SDK opencv	10	2128	October 12, 2021
Request for Assistance in Improving Output Render Quality DeepStream SDK jetson , deepstream	10	44	December 17, 2024
OpenCV CUDA processing from gstreamer pipeline [JP4, JP5] Jetson AGX Orin opencv , gstreamer	3	1093	November 30, 2023
How to create opencv gpumat from nvstream? DeepStream SDK	36	14828	July 27, 2021
Error generated while running the code after connecting the camera Jetson Xavier NX gstreamer , nvbugs	45	1368	January 2, 2024

Deepstream

Related topics