Nvidia Performance Primitive (NPP) NV12 format

Hello All!

I’m trying to call nppiNV12ToBGR_8u_P2C4R to convert my CUeglFrame to BGR format.

The current frame is in CU_EGL_COLOR_FORMAT_YUV420_SEMIPLANAR_ER, which I believe to be NV12.

CU_EGL_COLOR_FORMAT_YUV420_SEMIPLANAR_ER       = 0x26,  

Extended Range Y, UV in two surfaces (UV as one surface) 
with VU byte ordering, U/V width = 1/2 Y width, U/V height = 1/2 Y height. 

So far, I’ve been successful to extract Y and CrCb separately.

        ....

        // Device memory
        uchar *d_Y;
        uchar *d_CrCb;
        uchar *d_BGRA;

        // Width, Height
        cudaMalloc(&d_Y, sizeof(uchar) * WIDTH * HEIGHT);
        // (Width / 2), (Height / 2)
        cudaMalloc(&d_CrCb, sizeof(uchar) * WIDTH * HEIGHT / 4);
        // Width, Height, 4 channels
        cudaMalloc(&d_BGRA, sizeof(uchar) * WIDTH * HEIGHT * 4);

        err = cudaMemcpy2DFromArray(d_Y,
                                    WIDTH * sizeof(uchar),
                                    (cudaArray_t)cuY,
                                    0,
                                    0,
                                    WIDTH * sizeof(uchar),
                                    HEIGHT,
                                    cudaMemcpyDeviceToDevice);

        checkError(err, "cudaMemcpy2DFromArray");

        err = cudaMemcpy2DFromArray(d_CrCb,
                                    (WIDTH / 2) * sizeof(uchar),
                                    (cudaArray_t)cuCrCb,
                                    0,
                                    0,
                                    (WIDTH / 2) * sizeof(uchar),
                                    (HEIGHT / 2),
                                    cudaMemcpyDeviceToDevice);

        checkError(err, "cudaMemcpy2DFromArray");

        // EVERYHING IS GOOD UNTIL HERE

        uchar *pSrc[] = {d_Y, d_CrCb};
        NppiSize roi{
            .width = static_cast<int>(WIDTH),
            .height = static_cast<int>(HEIGHT),
        };

        NppStatus npperr = nppiNV12ToBGR_8u_P2C4R(pSrc, WIDTH, d_BGRA, WIDTH, roi);

        if (npperr)
        {
            printf("%x\n", npperr);    
            printf("%d\n", npperr);    
        }

Memory copies terminate without any error, but it doesn’t when I call NPP function.

The npperr is -14, which is NPP_STEP_ERROR , “Step is less or equal zero.”

I’m using width which is 640.

So my question is,
What is the shape of NV12 in NPP?

The description says it consists of two planes. Checked ✔️
The Y plane is surely height by width. Checked ✔️
The CrCb plane is not clear yet. ❌

here is a post that discusses NV12 in some detail

here is a post that shows NV12 conversion using NPP