EGLStream(CUDA) -> cv::cuda::GpuMat using Argus & nppi

WhaSukGO · January 26, 2023, 9:56am

I’m trying to retrieve a cv::cuda::GpuMat from CUDA EGLStream.

My code is based on jetson_multimedia_api/argus/samples/syncSensor

The current setting of my stream is

    iEGLStreamSettings->setPixelFormat(PIXEL_FMT_YCbCr_420_888);
    iEGLStreamSettings->setResolution(STREAM_SIZE);
    iEGLStreamSettings->setEGLDisplay(g_display.get());

I’ve made some modifications on ScopedCudaEGLStreamFrameAcquire::generateHistogram

bool ScopedCudaEGLStreamFrameAcquire::generateHistogram(unsigned int histogramData[HISTOGRAM_BINS],
                                                        float *time)
{
    if (!hasValidFrame() || !histogramData || !time)
        ORIGINATE_ERROR("Invalid state or output parameters");

    unsigned int height = m_frame.height;
    unsigned int width = m_frame.width;

    NppiSize in_size{
        .width = static_cast<int>(width),
        .height = static_cast<int>(height),
    };

    cv::cuda::GpuMat gpuMat;
    gpuMat.create(cv::Size(width, height), CV_8UC4);

    NppStatus status = nppiNV21ToBGR_8u_P2C4R((const Npp8u * const*)m_frame.frame.pPitch[0],
                                              m_frame.pitch,
                                              (Npp8u *)gpuMat.cudaPtr(),
                                              gpuMat.step,
                                              in_size);

    return true;
}

The function terminates successfully, but throws an error on the next call at gpuMat.create()

terminate called after throwing an instance of 'cv::Exception'
  what():  OpenCV(4.6.0) /home/nano1/opencv/modules/core/src/cuda/gpu_mat.cu:116: error: (-217:Gpu API call) unspecified launch failure in function 'allocate'

I haven’t called any OpenCV function outside generateHistogram()

Any help will be appreciated!

WhaSukGO · January 27, 2023, 9:12am

I’ve just noticed there’re some OpenCV codes left.

After removing them, the code fails on the second execution with NPP_CUDA_KERNEL_EXECUTION_ERROR

WhaSukGO · January 27, 2023, 3:54pm

    ...

    r = cuEGLStreamConsumerAcquireFrame(&m_connection, &m_resource, &m_stream, -1);
    if (r == CUDA_SUCCESS)
    {
      printf("Frame acquired succesfully!\n");
      r = cuGraphicsResourceGetMappedEglFrame(&m_frame, m_resource, 0, 0);

      if (r != CUDA_SUCCESS)
      {
        const char* errmsg;

        cuGetErrorString(r, &errmsg);

        printf("cuGraphicsResourceGetMappedEglFrame failed\n");
        printf("%s\n", errmsg);
      }
    }

A small update on my progress so far.

unspecified launch failure is returned by cuGraphicsResourceGetMappedEglFrame().

WhaSukGO · January 28, 2023, 1:03am

The ret val of cuGraphicsResourceGetMappedEglFrame() is 719, which is CUDA_ERROR_LAUNCH_FAILED

WhaSukGO · January 29, 2023, 2:50am

TL;DR

When nppiNV21ToBGR_8u_P2C4R_Ctx() is called to convert a CUeglFrame into cv::cuda::GpuMat (BGR),
cuEGLStreamConsumerReleaseFrame() fails.

I then tried to gpuMat.download(cpuMat) and cv::imwrite(cpuMat), it throws unspecified launch failure as well.

========LOOP STARTED========
ScopedCudaEGLStreamFrameAcquire::ScopedCudaEGLStreamFrameAcquire | [GOOD] cudaStreamCreate
ScopedCudaEGLStreamFrameAcquire::ScopedCudaEGLStreamFrameAcquire | [GOOD] cuEGLStreamConsumerAcquireFrame!
ScopedCudaEGLStreamFrameAcquire::ScopedCudaEGLStreamFrameAcquire | [GOOD] cuGraphicsResourceGetMappedEglFrame 
ScopedCudaEGLStreamFrameAcquire::ScopedCudaEGLStreamFrameAcquire | [GOOD] m_connection: 0x7f60463b90
ScopedCudaEGLStreamFrameAcquire::ScopedCudaEGLStreamFrameAcquire | [GOOD] m_resource: 0x7f60479420
ScopedCudaEGLStreamFrameAcquire::ScopedCudaEGLStreamFrameAcquire | [GOOD] m_stream: 0x7f60463080
ScopedCudaEGLStreamFrameAcquire::generateHistogram | [GOOD] nppSetStream
ScopedCudaEGLStreamFrameAcquire::generateHistogram | [GOOD] nppGetStreamContext
ScopedCudaEGLStreamFrameAcquire::generateHistogram | [GOOD] cudaStreamSynchronize
========LOOP FINISHED========
ScopedCudaEGLStreamFrameAcquire::~ScopedCudaEGLStreamFrameAcquire | [BAD] cuEGLStreamConsumerReleaseFrame! <- Crashes
ScopedCudaEGLStreamFrameAcquire::~ScopedCudaEGLStreamFrameAcquire | [BAD] m_connection: 0x7f60463b90
ScopedCudaEGLStreamFrameAcquire::~ScopedCudaEGLStreamFrameAcquire | [BAD] m_resource: 0x7f60479420
ScopedCudaEGLStreamFrameAcquire::~ScopedCudaEGLStreamFrameAcquire | [BAD] m_stream: 0x7f60463080
ScopedCudaEGLStreamFrameAcquire::~ScopedCudaEGLStreamFrameAcquire | [BAD] cuEGLStreamConsumerReleaseFrame unspecified launch failure!
========LOOP STARTED========
ScopedCudaEGLStreamFrameAcquire::ScopedCudaEGLStreamFrameAcquire | [BAD] cudaStreamCreate
ScopedCudaEGLStreamFrameAcquire::ScopedCudaEGLStreamFrameAcquire | [BAD] cuGraphicsResourceGetMappedEglFrame!
CONSUMER: No more frames. Cleaning up.
CONSUMER: Done.
PRODUCER: Captures complete, disconnecting producer.
PRODUCER: Done -- exiting.

I’ve attached my code for anyone interested in.

Always, appreciate your help!

syncSensor.zip (11.4 KB)

jahwan.oh · August 4, 2023, 3:38pm

Hi, I’m also try to do the same job (Argus → gpumat).
I’m really curious what happened since the last comment.
did you have some good result after all?

WhaSukGO · August 10, 2023, 2:07am

Which part do you need nelp?

jahwan.oh · August 10, 2023, 9:01am

Hi,
Thank you for the reply,

I also trying to check the image is wrapped well in opencv container by: cuEglFrame → cv::cuda::gpuMat → cv::Mat -->imwrite

it seems the data is fine because cuEglFrame → cuda kernal (in jetson multimedia api → histogram) is working,
but any opencv function gives error. ex)imwrite, or sometimes download from gpuMat to cpu Mat…

I saw your comment as below, so I was wondering if you successfully done with opencv related stuff in your test.

I then tried to gpuMat.download(cpuMat) and cv::imwrite(cpuMat) , it throws unspecified launch failure as well.

For example I used these posts to wrap the data to opencv Mats.

cudaMemcpy2D:

zf1116 · August 10, 2023, 9:10am

please don’t use opencv, y’d better code using nvsurface

---- Replied Message ----

From | jahwan.oh via NVIDIA Developer Forums notifications@nvidia.discoursemail.com |

| - |
Date | 08/10/2023 17:07 |
To | zf1116@126.com |
Cc | |
Subject | [NVIDIA Developer Forums] [AI & Data Science/Computer Vision & Image Processing] EGLStream(CUDA) → cv::cuda::GpuMat using Argus & nppi |

| jahwan.oh
August 10 |

| - |

Hi,
Thank you for the reply,

I also trying to check the image is wrapped well in opencv container by: cuEglFrame → cv::cuda::gpuMat → cv::Mat -->imwrite

it seems the data is fine because cuEglFrame → cuda kernal (in jetson multimedia api → histogram) is working,
but any opencv function gives error. ex)imwrite, or sometimes download from gpuMat to cpu Mat…

I saw your comment as below, so I was wondering if you successfully done with opencv related stuff in your test.

I then tried to gpuMat.download(cpuMat) and cv::imwrite(cpuMat) , it throws unspecified launch failure as well.

For example I used these posts to wrap the data to opencv Mats.

NvBufSurface and OpenCV DeepStream SDK

• Hardware Platform Jetson • DeepStream Version 6 • JetPack Version 4.6 • TensorRT Version 8 • Issue Type questions I am following the instructions from the documentation to access the frames from the NvBufSurface’s surfaceList. The documentation suggests using dataPtr, but in nvbfsurface.h we can find this comment for the field: /** Holds a pointer to allocated memory. Not valid for \ref NVBUF_MEM_SURFACE_ARRAY or \ref NVBUF_MEM_HANDLE. */ void * dataPtr; SInce NVBUF_MEM_SURFACE_A…

I get image from NvBufSurface but i cvtColor NV12 to RGB Error. why? General Topics and Other SDKs

v4l2camerasrc->nvvideoconvert1->nvstreammux->nvvideoconvert2->nvtransform->nveglglessink static GstPadProbeReturn nvvidconvert2_sink_pad_buffer_probe (GstPad * pad, GstPadProbeInfo * info, gpointer u_data) { GstBuffer *buf = (GstBuffer *)info->data; NvDsMetaList * l_frame = NULL; NvDsUserMeta *user_meta = NULL; NvDsMetaList * l_user_meta = NULL; NvDsBatchMeta *batch_meta = gst_buffer_get_nvds_batch_meta(buf); for (l_frame = batch_meta->frame_meta_list; l_frame !=NULL; l_frame = l_frame…

cudaMemcpy2D:

jahwan.oh · August 10, 2023, 9:23am

Thank you for the reply.
Right, I also don’t need to use opencv itself. but I need to use this data to be processed in the cuda kernal.

It seems there are two options to access to the data;
NvBufSurface.surfaceList[index].dataPtr vs cuEGLFrame.frame.pPitch[plane]

Can I use any options above?

and I also was reading your post ;I get image from NvBufSurface but i cvtColor NV12 to RGB Error. why?

could you post your working version in the post if that is OK for you?

Best Regards,

WhaSukGO · August 11, 2023, 12:40am

@jahwan.oh

How `cv::Mat is` structured?

OpenCV cannot take CuArray.

It uses a linear memory for cv::Mat and cv::cuda::Mat under the hood.

For example, if the dimension of a cv::Mat is 1920x1080x3, a normal color image, the memory is a linear memory, malloc(sizeof(uchar) * 1920x1080x3).
(I’ll get to stepSize later)

NOW THE TASK IS CLEAR: YOU NEED A LINEAR MEMORY

CUarray

Let’s start from jetson_multimedia_api/argus/samples/syncSensor.

cudaResourceDesc.res.array.hArray = m_frame.frame.pArray[0];

The type of pArray is CUarray.

CUarray is special that you cannot access its element via indexing, such as pArray[idx].
It requires a special datatype to manipulate its data, such as surface.
(Now, it’s clear why histogram is calculated after converting to surface)

Table 16. Objects Available in the CUDA Driver API

NV12

You can checkout the color scheme of frame by m_frame.eglColorFormat

    CU_EGL_COLOR_FORMAT_YUV420_SEMIPLANAR_ER       = 0x26,  /**< Extended Range Y, UV in two surfaces (UV as one surface) with VU byte ordering, U/V width = 1/2 Y width, U/V height = 1/2 Y height. */

YUV420_SEMIPLANAR_ER is commonly called NV12.

It consists of two planes: Y and UV. (Yes, UV is on a single plane)
The size of UY is (width x height/2)
(U/V width = 1/2 Y width, U/V height = 1/2 throws an error when I try. Maybe an erratum?)

CUarray to Linear

To convert CUarray to a linear memory, cudaMemcpy2DFromArray is used.
Optional) async is for optimization.
cv::cuda doesn’t provide NV12 → BGR. Found out that NPP provides!

            int width, height;
            uchar *d_bgr;        
            uchar *d_Y;
            uchar *d_CrCb;

            cudaMalloc(&d_Y, sizeof(uchar) * width * height);
            cudaMalloc(&d_CrCb, sizeof(uchar) * width * (height / 2));
            cudaMalloc(&d_bgr, sizeof(uchar) * width * height * 3);


            CUarray cuY = m_frame.frame.pArray[0];
            CUarray cuCrCb = m_frame.frame.pArray[1];
            
            const size_t HEIGHT = m_frame.height;
            const size_t WIDTH = m_frame.width;
            const size_t HEIGHT_HALF = HEIGHT / 2;
            const size_t WIDTH_HALF = WIDTH / 2;
            const size_t HEIGHT_HALF_HALF = HEIGHT / 4;
            const size_t WIDTH_HALF_HALF = WIDTH / 4;
            const size_t CHANNEL = 3;

            cudaError_t err;
            NppStatus nppErr;

            // Retrieve Y and CbCr palnes
            err = cudaMemcpy2DFromArrayAsync(d_Y,
                                        WIDTH * sizeof(uchar),
                                        (cudaArray_t)cuY,
                                        0,
                                        0,
                                        WIDTH * sizeof(uchar),
                                        HEIGHT,
                                        cudaMemcpyDeviceToDevice,
                                        stream.hStream);

            // checkError(err, "cudaMemcpy2DFromArray - Y");

            err = cudaMemcpy2DFromArrayAsync(d_CrCb,
                                        WIDTH * sizeof(uchar),
                                        (cudaArray_t)cuCrCb,
                                        0,
                                        0,
                                        WIDTH * sizeof(uchar),
                                        HEIGHT_HALF,
                                        cudaMemcpyDeviceToDevice,
                                        stream.hStream);

            // checkError(err, "cudaMemcpy2DFromArray - CrCb");

            Npp8u *const pSrc[2] = {d_Y, d_CrCb};
            int rSrcStep = WIDTH * sizeof(uchar);
            int nDstStep = WIDTH * 3 * sizeof(uchar);
            NppiSize oSizeROI = {WIDTH, HEIGHT};

            nppErr = nppiNV12ToBGR_8u_P2C3R_Ctx(pSrc, rSrcStep, d_bgr, nDstStep, oSizeROI, stream);            

            // if(nppErr)
            // {
            //     printf("%d\n", nppErr);
            // }

            cv::cuda::GpuMat gpuMatBGR;
            gpuMatBGR.create(HEIGHT, WIDTH, CV_8UC3);
            gpuMatBGR.data = d_bgr;
            gpuMatBGR.step = WIDTH * 3 * sizeof(uchar);

            // cudaFreeAsync is avilable from CUDA 11.2
            cudaFree(d_Y);
            cudaFree(d_CrCb);

Also, don’t forget to cudaFree(gpuMatBGR.data) when the mat is no longer used to prevent memory leak!

Tip

When you create cv::cuda::GpuMat and fail, check out its stepSize.
For cv::cuda::GpuMat, I had to set stepSize explicitly.
Sometimes, CUDA uses an extra space for stepSize for efficiency.

jahwan.oh · August 11, 2023, 1:52pm

I digested your comment.

You are amazing :) I didn’t know the concept of CUArray
(cudaEGLFrame.frameType shows Array, but I kept trying using pPitch)

Also, I just need to use NV12 format in my video pipeline so it is already perfect.

Thank you!!!

WhaSukGO · August 13, 2023, 12:23am

Glad it helps!

zf1116 · August 15, 2023, 12:22am

checking the YUV format please.

---- Replied Message ----

From | jahwan.oh via NVIDIA Developer Forums notifications@nvidia.discoursemail.com |

| - |
Date | 08/10/2023 17:24 |
To | zf1116@126.com |
Cc | |
Subject | [NVIDIA Developer Forums] [AI & Data Science/Computer Vision & Image Processing] EGLStream(CUDA) → cv::cuda::GpuMat using Argus & nppi |

| jahwan.oh
August 10 |

| - |

Thank you for the reply.
Right, I also don’t need to use opencv itself. but I need to use this data to be processed in the cuda kernal.

It seems there are two options to access to the data;
NvBufSurface.surfaceList[index].dataPtr vs cuEGLFrame.frame.pPitch[plane]

Can I use any options above?

and I also was reading your post ;I get image from NvBufSurface but i cvtColor NV12 to RGB Error. why?

could you post your working version in the post if that is OK for you?

Best Regards,

jahwan.oh · August 16, 2023, 2:47pm

what do you mean by checking YUV format?

zf1116 · August 17, 2023, 6:08am

yes, check yuv format first

---- Replied Message ----

From | jahwan.oh via NVIDIA Developer Forums notifications@nvidia.discoursemail.com |

| - |
Date | 08/16/2023 22:49 |
To | zf1116@126.com |
Cc | |
Subject | [NVIDIA Developer Forums] [AI & Data Science/Computer Vision & Image Processing] EGLStream(CUDA) → cv::cuda::GpuMat using Argus & nppi |

| jahwan.oh
August 16 |

| - |

what do you mean by checking YUV format?

system · August 31, 2023, 6:09am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
How to create opencv gpumat from nvstream? DeepStream SDK	36	14402	July 27, 2021
gstreamer NVMM <-> opencv gpuMat Jetson TX2 opencv	58	21600	October 18, 2021
Error generated while running the code after connecting the camera Jetson Xavier NX gstreamer , nvbugs	45	1240	January 2, 2024
Opencv gpu mat into GStreamer without downloading to cpu Jetson Nano opencv , gstreamer	19	8364	October 13, 2021
Deepstream DeepStream SDK gstreamer , jetson , deepstream	14	434	July 9, 2024
LibArgus EGLStream to nvivafilter Jetson TX2	12	6220	October 18, 2021
OpenCV CUDA processing from gstreamer pipeline [JP4, JP5] Jetson AGX Orin opencv , gstreamer	3	991	November 30, 2023
dps4 on xavier: failed to register EGLImage in cuda DeepStream SDK	16	2067	October 12, 2021
How to save image to file ? Jetson TX2	22	5874	October 18, 2021
nvivafilter customer-lib implementation (code available) Jetson AGX Xavier	7	1773	October 18, 2021

EGLStream(CUDA) -> cv::cuda::GpuMat using Argus & nppi

TL;DR

How cv::Mat is structured?

CUarray

NV12

CUarray to Linear

Tip

Related topics

How `cv::Mat is` structured?