EGLStream(CUDA) -> cv::cuda::GpuMat using Argus & nppi

I’m trying to retrieve a cv::cuda::GpuMat from CUDA EGLStream.

My code is based on jetson_multimedia_api/argus/samples/syncSensor

The current setting of my stream is

    iEGLStreamSettings->setPixelFormat(PIXEL_FMT_YCbCr_420_888);
    iEGLStreamSettings->setResolution(STREAM_SIZE);
    iEGLStreamSettings->setEGLDisplay(g_display.get());

I’ve made some modifications on ScopedCudaEGLStreamFrameAcquire::generateHistogram

bool ScopedCudaEGLStreamFrameAcquire::generateHistogram(unsigned int histogramData[HISTOGRAM_BINS],
                                                        float *time)
{
    if (!hasValidFrame() || !histogramData || !time)
        ORIGINATE_ERROR("Invalid state or output parameters");

    unsigned int height = m_frame.height;
    unsigned int width = m_frame.width;

    NppiSize in_size{
        .width = static_cast<int>(width),
        .height = static_cast<int>(height),
    };

    cv::cuda::GpuMat gpuMat;
    gpuMat.create(cv::Size(width, height), CV_8UC4);

    NppStatus status = nppiNV21ToBGR_8u_P2C4R((const Npp8u * const*)m_frame.frame.pPitch[0],
                                              m_frame.pitch,
                                              (Npp8u *)gpuMat.cudaPtr(),
                                              gpuMat.step,
                                              in_size);

    return true;
}

The function terminates successfully, but throws an error on the next call at gpuMat.create()

terminate called after throwing an instance of 'cv::Exception'
  what():  OpenCV(4.6.0) /home/nano1/opencv/modules/core/src/cuda/gpu_mat.cu:116: error: (-217:Gpu API call) unspecified launch failure in function 'allocate'

I haven’t called any OpenCV function outside generateHistogram()

Any help will be appreciated!

I’ve just noticed there’re some OpenCV codes left.

After removing them, the code fails on the second execution with NPP_CUDA_KERNEL_EXECUTION_ERROR

    ...

    r = cuEGLStreamConsumerAcquireFrame(&m_connection, &m_resource, &m_stream, -1);
    if (r == CUDA_SUCCESS)
    {
      printf("Frame acquired succesfully!\n");
      r = cuGraphicsResourceGetMappedEglFrame(&m_frame, m_resource, 0, 0);

      if (r != CUDA_SUCCESS)
      {
        const char* errmsg;

        cuGetErrorString(r, &errmsg);

        printf("cuGraphicsResourceGetMappedEglFrame failed\n");
        printf("%s\n", errmsg);
      }
    }

A small update on my progress so far.

unspecified launch failure is returned by cuGraphicsResourceGetMappedEglFrame().

The ret val of cuGraphicsResourceGetMappedEglFrame() is 719, which is CUDA_ERROR_LAUNCH_FAILED

TL;DR

When nppiNV21ToBGR_8u_P2C4R_Ctx() is called to convert a CUeglFrame into cv::cuda::GpuMat (BGR),
cuEGLStreamConsumerReleaseFrame() fails.

I then tried to gpuMat.download(cpuMat) and cv::imwrite(cpuMat), it throws unspecified launch failure as well.

========LOOP STARTED========
ScopedCudaEGLStreamFrameAcquire::ScopedCudaEGLStreamFrameAcquire | [GOOD] cudaStreamCreate
ScopedCudaEGLStreamFrameAcquire::ScopedCudaEGLStreamFrameAcquire | [GOOD] cuEGLStreamConsumerAcquireFrame!
ScopedCudaEGLStreamFrameAcquire::ScopedCudaEGLStreamFrameAcquire | [GOOD] cuGraphicsResourceGetMappedEglFrame 
ScopedCudaEGLStreamFrameAcquire::ScopedCudaEGLStreamFrameAcquire | [GOOD] m_connection: 0x7f60463b90
ScopedCudaEGLStreamFrameAcquire::ScopedCudaEGLStreamFrameAcquire | [GOOD] m_resource: 0x7f60479420
ScopedCudaEGLStreamFrameAcquire::ScopedCudaEGLStreamFrameAcquire | [GOOD] m_stream: 0x7f60463080
ScopedCudaEGLStreamFrameAcquire::generateHistogram | [GOOD] nppSetStream
ScopedCudaEGLStreamFrameAcquire::generateHistogram | [GOOD] nppGetStreamContext
ScopedCudaEGLStreamFrameAcquire::generateHistogram | [GOOD] cudaStreamSynchronize
========LOOP FINISHED========
ScopedCudaEGLStreamFrameAcquire::~ScopedCudaEGLStreamFrameAcquire | [BAD] cuEGLStreamConsumerReleaseFrame! <- Crashes
ScopedCudaEGLStreamFrameAcquire::~ScopedCudaEGLStreamFrameAcquire | [BAD] m_connection: 0x7f60463b90
ScopedCudaEGLStreamFrameAcquire::~ScopedCudaEGLStreamFrameAcquire | [BAD] m_resource: 0x7f60479420
ScopedCudaEGLStreamFrameAcquire::~ScopedCudaEGLStreamFrameAcquire | [BAD] m_stream: 0x7f60463080
ScopedCudaEGLStreamFrameAcquire::~ScopedCudaEGLStreamFrameAcquire | [BAD] cuEGLStreamConsumerReleaseFrame unspecified launch failure!
========LOOP STARTED========
ScopedCudaEGLStreamFrameAcquire::ScopedCudaEGLStreamFrameAcquire | [BAD] cudaStreamCreate
ScopedCudaEGLStreamFrameAcquire::ScopedCudaEGLStreamFrameAcquire | [BAD] cuGraphicsResourceGetMappedEglFrame!
CONSUMER: No more frames. Cleaning up.
CONSUMER: Done.
PRODUCER: Captures complete, disconnecting producer.
PRODUCER: Done -- exiting.

I’ve attached my code for anyone interested in.

Always, appreciate your help!

syncSensor.zip (11.4 KB)

Hi, I’m also try to do the same job (Argus → gpumat).
I’m really curious what happened since the last comment.
did you have some good result after all?

1 Like

Which part do you need nelp?

1 Like

Hi,
Thank you for the reply,

I also trying to check the image is wrapped well in opencv container by: cuEglFrame → cv::cuda::gpuMat → cv::Mat -->imwrite

it seems the data is fine because cuEglFrame → cuda kernal (in jetson multimedia api → histogram) is working,
but any opencv function gives error. ex)imwrite, or sometimes download from gpuMat to cpu Mat…

I saw your comment as below, so I was wondering if you successfully done with opencv related stuff in your test.

I then tried to gpuMat.download(cpuMat) and cv::imwrite(cpuMat) , it throws unspecified launch failure as well.

For example I used these posts to wrap the data to opencv Mats.

cudaMemcpy2D:

please don’t use opencv, y’d better code using nvsurface

---- Replied Message ----

From | jahwan.oh via NVIDIA Developer Forumsnotifications@nvidia.discoursemail.com |

  • | - |
    Date | 08/10/2023 17:07 |
    To | zf1116@126.com |
    Cc | |
    Subject | [NVIDIA Developer Forums] [AI & Data Science/Computer Vision & Image Processing] EGLStream(CUDA) → cv::cuda::GpuMat using Argus & nppi |

| jahwan.oh
August 10 |

  • | - |

Hi,
Thank you for the reply,

I also trying to check the image is wrapped well in opencv container by: cuEglFrame → cv::cuda::gpuMat → cv::Mat -->imwrite

it seems the data is fine because cuEglFrame → cuda kernal (in jetson multimedia api → histogram) is working,
but any opencv function gives error. ex)imwrite, or sometimes download from gpuMat to cpu Mat…

I saw your comment as below, so I was wondering if you successfully done with opencv related stuff in your test.

I then tried to gpuMat.download(cpuMat) and cv::imwrite(cpuMat) , it throws unspecified launch failure as well.

For example I used these posts to wrap the data to opencv Mats.

NvBufSurface and OpenCV DeepStream SDK

• Hardware Platform Jetson • DeepStream Version 6 • JetPack Version 4.6 • TensorRT Version 8 • Issue Type questions I am following the instructions from the documentation to access the frames from the NvBufSurface’s surfaceList. The documentation suggests using dataPtr, but in nvbfsurface.h we can find this comment for the field: /** Holds a pointer to allocated memory. Not valid for \ref NVBUF_MEM_SURFACE_ARRAY or \ref NVBUF_MEM_HANDLE. */ void * dataPtr; SInce NVBUF_MEM_SURFACE_A…

I get image from NvBufSurface but i cvtColor NV12 to RGB Error. why? General Topics and Other SDKs

v4l2camerasrc->nvvideoconvert1->nvstreammux->nvvideoconvert2->nvtransform->nveglglessink static GstPadProbeReturn nvvidconvert2_sink_pad_buffer_probe (GstPad * pad, GstPadProbeInfo * info, gpointer u_data) { GstBuffer *buf = (GstBuffer *)info->data; NvDsMetaList * l_frame = NULL; NvDsUserMeta *user_meta = NULL; NvDsMetaList * l_user_meta = NULL; NvDsBatchMeta *batch_meta = gst_buffer_get_nvds_batch_meta(buf); for (l_frame = batch_meta->frame_meta_list; l_frame !=NULL; l_frame = l_frame…

cudaMemcpy2D:

1 Like

Thank you for the reply.
Right, I also don’t need to use opencv itself. but I need to use this data to be processed in the cuda kernal.

It seems there are two options to access to the data;
NvBufSurface.surfaceList[index].dataPtr vs cuEGLFrame.frame.pPitch[plane]

Can I use any options above?

and I also was reading your post ;I get image from NvBufSurface but i cvtColor NV12 to RGB Error. why?

could you post your working version in the post if that is OK for you?

Best Regards,

@jahwan.oh

How cv::Mat is structured?

OpenCV cannot take CuArray.

It uses a linear memory for cv::Mat and cv::cuda::Mat under the hood.

For example, if the dimension of a cv::Mat is 1920x1080x3, a normal color image, the memory is a linear memory, malloc(sizeof(uchar) * 1920x1080x3).
(I’ll get to stepSize later)

NOW THE TASK IS CLEAR: YOU NEED A LINEAR MEMORY

CUarray

Let’s start from jetson_multimedia_api/argus/samples/syncSensor.

cudaResourceDesc.res.array.hArray = m_frame.frame.pArray[0];

The type of pArray is CUarray.

CUarray is special that you cannot access its element via indexing, such as pArray[idx].
It requires a special datatype to manipulate its data, such as surface.
(Now, it’s clear why histogram is calculated after converting to surface)


Table 16. Objects Available in the CUDA Driver API

NV12

You can checkout the color scheme of frame by m_frame.eglColorFormat

    CU_EGL_COLOR_FORMAT_YUV420_SEMIPLANAR_ER       = 0x26,  /**< Extended Range Y, UV in two surfaces (UV as one surface) with VU byte ordering, U/V width = 1/2 Y width, U/V height = 1/2 Y height. */

YUV420_SEMIPLANAR_ER is commonly called NV12.

It consists of two planes: Y and UV. (Yes, UV is on a single plane)
The size of UY is (width x height/2)
(U/V width = 1/2 Y width, U/V height = 1/2 throws an error when I try. Maybe an erratum?)

CUarray to Linear

  • To convert CUarray to a linear memory, cudaMemcpy2DFromArray is used.
  • Optional) async is for optimization.
  • cv::cuda doesn’t provide NV12BGR. Found out that NPP provides!
            int width, height;
            uchar *d_bgr;        
            uchar *d_Y;
            uchar *d_CrCb;

            cudaMalloc(&d_Y, sizeof(uchar) * width * height);
            cudaMalloc(&d_CrCb, sizeof(uchar) * width * (height / 2));
            cudaMalloc(&d_bgr, sizeof(uchar) * width * height * 3);

            CUarray cuY = m_frame.frame.pArray[0];
            CUarray cuCrCb = m_frame.frame.pArray[1];
            
            const size_t HEIGHT = m_frame.height;
            const size_t WIDTH = m_frame.width;
            const size_t HEIGHT_HALF = HEIGHT / 2;
            const size_t WIDTH_HALF = WIDTH / 2;
            const size_t HEIGHT_HALF_HALF = HEIGHT / 4;
            const size_t WIDTH_HALF_HALF = WIDTH / 4;
            const size_t CHANNEL = 3;

            cudaError_t err;
            NppStatus nppErr;

            // Retrieve Y and CbCr palnes
            err = cudaMemcpy2DFromArrayAsync(d_Y,
                                        WIDTH * sizeof(uchar),
                                        (cudaArray_t)cuY,
                                        0,
                                        0,
                                        WIDTH * sizeof(uchar),
                                        HEIGHT,
                                        cudaMemcpyDeviceToDevice,
                                        stream.hStream);

            // checkError(err, "cudaMemcpy2DFromArray - Y");

            err = cudaMemcpy2DFromArrayAsync(d_CrCb,
                                        WIDTH * sizeof(uchar),
                                        (cudaArray_t)cuCrCb,
                                        0,
                                        0,
                                        WIDTH * sizeof(uchar),
                                        HEIGHT_HALF,
                                        cudaMemcpyDeviceToDevice,
                                        stream.hStream);

            // checkError(err, "cudaMemcpy2DFromArray - CrCb");

            Npp8u *const pSrc[2] = {d_Y, d_CrCb};
            int rSrcStep = WIDTH * sizeof(uchar);
            int nDstStep = WIDTH * 3 * sizeof(uchar);
            NppiSize oSizeROI = {WIDTH, HEIGHT};

            nppErr = nppiNV12ToBGR_8u_P2C3R_Ctx(pSrc, rSrcStep, d_bgr, nDstStep, oSizeROI, stream);            

            // if(nppErr)
            // {
            //     printf("%d\n", nppErr);
            // }

            cv::cuda::GpuMat gpuMatBGR;
            gpuMatBGR.create(HEIGHT, WIDTH, CV_8UC3);
            gpuMatBGR.data = d_bgr;
            gpuMatBGR.step = WIDTH * 3 * sizeof(uchar);

            // cudaFreeAsync is avilable from CUDA 11.2
            cudaFree(d_Y);
            cudaFree(d_CrCb);

Also, don’t forget to cudaFree(gpuMatBGR.data) when the mat is no longer used to prevent memory leak!

Tip

  • When you create cv::cuda::GpuMat and fail, check out its stepSize.
    For cv::cuda::GpuMat, I had to set stepSize explicitly.

  • Sometimes, CUDA uses an extra space for stepSize for efficiency.

2 Likes

I digested your comment.

You are amazing :) I didn’t know the concept of CUArray
(cudaEGLFrame.frameType shows Array, but I kept trying using pPitch)

Also, I just need to use NV12 format in my video pipeline so it is already perfect.

Thank you!!!

Glad it helps!

1 Like

checking the YUV format please.

---- Replied Message ----

From | jahwan.oh via NVIDIA Developer Forumsnotifications@nvidia.discoursemail.com |

  • | - |
    Date | 08/10/2023 17:24 |
    To | zf1116@126.com |
    Cc | |
    Subject | [NVIDIA Developer Forums] [AI & Data Science/Computer Vision & Image Processing] EGLStream(CUDA) → cv::cuda::GpuMat using Argus & nppi |

| jahwan.oh
August 10 |

  • | - |

Thank you for the reply.
Right, I also don’t need to use opencv itself. but I need to use this data to be processed in the cuda kernal.

It seems there are two options to access to the data;
NvBufSurface.surfaceList[index].dataPtr vs cuEGLFrame.frame.pPitch[plane]

Can I use any options above?

and I also was reading your post ;I get image from NvBufSurface but i cvtColor NV12 to RGB Error. why?

could you post your working version in the post if that is OK for you?

Best Regards,

what do you mean by checking YUV format?

yes, check yuv format first

---- Replied Message ----

From | jahwan.oh via NVIDIA Developer Forumsnotifications@nvidia.discoursemail.com |

  • | - |
    Date | 08/16/2023 22:49 |
    To | zf1116@126.com |
    Cc | |
    Subject | [NVIDIA Developer Forums] [AI & Data Science/Computer Vision & Image Processing] EGLStream(CUDA) → cv::cuda::GpuMat using Argus & nppi |

| jahwan.oh
August 16 |

  • | - |

what do you mean by checking YUV format?

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.