Convert YUV NV12 to RGB24 packed, CUDA_ERROR_ILLEGAL_ADDRESS

Ed.Koezly · November 20, 2019, 5:00pm

I’m attempting to convert a NV12 bitmap produced by the NVIDIA H.264 decode,r NVDEC, to a 24-bit packed RGB bitmap. I’m confident the decoder produces good NV12 frames most of the time because modifying the YUV output and re-encoding them with NVENC works or nvjpegEncodeYUV() works on most frames (some have vertical lines on the bottom part as strides are repeated sometimes, but at least no errors).

Calling nppiNV12ToRGB_8u_P2C3R() causes a subsequent 700, CUDA_ERROR_ILLEGAL_ADDRESS, return code from cudaDeviceSynchronize(), cuMemcpyDtoH() or cuMemcpy2D(). The error still occurs attempting to copy even if cudaDeviceSynchronize() is not called.

The return code from nppiNV12ToRGB_8u_P2C3R() is 0. If the call to nppiNV12ToRGB_8u_P2C3R() is commented out, cuMemcpyDtoH() or cuMemcpy2D() each return 0 and copy the requested number of bytes (all zeros).

Using cuMemCpy2DAsync() (return code=0) and cuStreamSynchronize() (return code=700) was similar.

What am I doing wrong in the setup or call to nppiNV12ToRGB_8u_P2C3R()? Is nppiNV12ToRGB_8u_P2C3R() a good method to convert in device memory?

Quadro P5000, Driver 26.21.14.3602 (NVIDIA 436.02 / Win 7 64, CUDA 10.1)

Thank you

int NvDecoder::HandlePictureDisplay(CUVIDPARSERDISPINFO *pDispInfo)
{
CUVIDPROCPARAMS videoProcessingParameters = {};
videoProcessingParameters.progressive_frame = pDispInfo->progressive_frame;
videoProcessingParameters.second_field = pDispInfo->repeat_first_field + 1;
videoProcessingParameters.top_field_first = pDispInfo->top_field_first;
videoProcessingParameters.unpaired_field = pDispInfo->repeat_first_field < 0;
videoProcessingParameters.output_stream = m_cuvidStream;
CUdeviceptr dpSrcFrame = 0;
unsigned int nSrcPitch = 0;

//Get the decoded YUV frame from NVDEC will be referenced by dpSrcFrame.
NVDEC_API_CALL(cuvidMapVideoFrame(m_hDecoder, pDispInfo->picture_index, &dpSrcFrame,
    &nSrcPitch, &videoProcessingParameters));

if (m_allocated == false)
{
    m_allocated = true;
    //also tried cuMemAllocPitch()
    m_rgb24 = nppiMalloc_8u_C3(1920,1080,&m_rgb24_pitch);
    DebugOutput(L"1 HandlePictureDisplay m_rgb24=%p, m_rgb24_pitch=%d",m_rgb24,m_rgb24_pitch);
    //pinned memory
    cudaError_t w_alloc_err = cudaHostAlloc(&m_host,m_rgb24_pitch * 1088,
                        cudaHostAllocPortable | cudaHostAllocMapped);
    DebugOutput(L"2 HandlePictureDisplay m_host=%p, w_alloc_err=%d",m_host,w_alloc_err);
}

CUdeviceptr pLuma = dpSrcFrame;
//m_nSurfaceHeight is 1088, set in HandleVideoSequence()
CUdeviceptr pChromaUV = dpSrcFrame + nSrcPitch * m_nSurfaceHeight; 

//Convert NV12 to RGB.
Npp8u *w_yuv_src[2] = {(Npp8u*)&pLuma,(Npp8u*)&pChromaUV};
NppiSize w_roi = {1920,1080};
NppStatus w_NppStatus = nppiNV12ToRGB_8u_P2C3R(w_yuv_src, nSrcPitch,
                                               (Npp8u*)m_rgb24, m_rgb24_pitch, w_roi);
DebugOutput(L"3 HandlePictureDisplay w_NppStatus=%d",w_NppStatus);

<b>cudaError_t w_cuErr = cudaDeviceSynchronize(); //Return code is 700, CUDA_ERROR_ILLEGAL_ADDRESS.</b>
DebugOutput(L"4 HandlePictureDisplay cudaDeviceSynchronize w_cuErr=%d",w_cuErr);

//Try just 100 bytes.  Return code is 700, CUDA_ERROR_ILLEGAL_ADDRESS.
<b>CUresult w_DhRc = cuMemcpyDtoH(m_host, (CUdeviceptr)m_rgb24, 100);</b>
DebugOutput(L"5 HandlePictureDisplay w_DhRc=%d",w_DhRc);

//Try copying the whole RGB bitmap.
CUDA_MEMCPY2D m = { 0 };
m.srcMemoryType = CU_MEMORYTYPE_DEVICE;
m.srcDevice = (CUdeviceptr) m_rgb24;
m.srcPitch = m_rgb24_pitch;
m.dstMemoryType = CU_MEMORYTYPE_HOST;
m.dstHost = m_host;
m.dstPitch = 1920*3;
m.WidthInBytes =1920*3;
m.Height = 1080;

<b>CUresult w_cuRc = cuMemcpy2D(&m);   //return code is 700, CUDA_ERROR_ILLEGAL_ADDRESS</b>

DebugOutput(L"6 HandlePictureDisplay w_cuRc=%d",w_cuRc);
NVDEC_API_CALL(cuvidUnmapVideoFrame(m_hDecoder, dpSrcFrame));
return 0;

}

output:
1 HandlePictureDisplay m_rgb24=000000060C000000, m_rgb24_pitch=6144
2 HandlePictureDisplay m_host=0000000203C00000, w_alloc_err=0
3 HandlePictureDisplay w_NppStatus=0
4 HandlePictureDisplay cudaDeviceSynchronize w_cuErr=700
5 HandlePictureDisplay w_DhRc=700
6 HandlePictureDisplay w_cuRc=700

levicki · November 24, 2019, 12:10am

Does the same happen if you don’t use ROI and allocate 1920x1088 instead of 1080?

Ed.Koezly · November 25, 2019, 3:57pm

Thank you Igor. Does “don’t use ROI”, mean set it to 1920,1088?
When I do that (or other variations), I still get the CUDA_ERROR_ILLEGAL_ADDRESS.

NppiSize w_roi = {1920,1088}; //CUDA_ERROR_ILLEGAL_ADDRESS
NppiSize w_roi = {2048,1088}; //CUDA_ERROR_ILLEGAL_ADDRESS
NppiSize w_roi = {640,360}; //CUDA_ERROR_ILLEGAL_ADDRESS
NppiSize w_roi = {1919,1087}; //CUDA_ERROR_ILLEGAL_ADDRESS
NppiSize w_roi = {1919,1079}; //CUDA_ERROR_ILLEGAL_ADDRESS

NppiSize w_roi = {}; //no errors, but bitmap (m_rgb24) is all zeros

I also tried these combinations.
m_rgb24 = nppiMalloc_8u_C3(1920,1088,&m_rgb24_pitch);
NppiSize w_roi = {1920,1088}; //CUDA_ERROR_ILLEGAL_ADDRESS

m_rgb24 = nppiMalloc_8u_C3(1920,1088,&m_rgb24_pitch);
NppiSize w_roi = {1920,1080}; //CUDA_ERROR_ILLEGAL_ADDRESS

m_rgb24 = nppiMalloc_8u_C3(1920,1088,&m_rgb24_pitch);
NppiSize w_roi = {}; //no errors, but bitmap (m_rgb24) is all zeros

Instead of nppiMalloc_8u_C3, I’ve tried variations on cuMemAllocPitch() with the same result.

Ed.Koezly · December 5, 2019, 11:24pm

Just wondering, not urgent since my immediate problem was with FFmpeg’s av_read_frame() and I don’t need this function to debug, but is nppiNV12ToRGB_8u_P2C3R() working for whomever happens to read this, yes or no?

Robert_Crovella · December 7, 2019, 10:45pm

I’ve just recently used it for a project. It works.

This doesn’t look right to me:

Npp8u *w_yuv_src[2] = {(Npp8u*)&pLuma,(Npp8u*)&pChromaUV};
                               ^              ^

Not sure why you have those ampersands there. Those should both be pointers, not pointer-to-pointers.

Ed.Koezly · December 9, 2019, 4:08pm

I’m not sure either since removing them fixed the problem! My prayer for humility has been answered again. Thank you, Robert

Topic		Replies	Views
fill CUVIDPICPARAMS problem CUDA Programming and Performance	1	1914	March 28, 2019
NVDEC hardware CUDA_ERROR_INVALID_VALUE cuvidDecodePicture call CUDA Programming and Performance	0	2623	March 28, 2019
How to convert YUV_NV12 to RGB using CUDA NPP？ CUDA-MEMCHECK cuda	1	1891	September 27, 2021
How can I use nppiYUV422ToRGB_8u_C2C3R() in an expected way? GPU-Accelerated Libraries npp	2	1453	March 15, 2022
How to convert YUV to RGB using CUDA NPP？ GPU-Accelerated Libraries npp	1	1711	September 18, 2021
Issue to decode Raw video YUY2 / YUYV 422 using CUDA , the "NvDecoder : cuvidCreateVideoParser" fail with "CUDA_ERROR_INVALID_SOURCE" Video Processing & Optical Flow	3	2257	February 15, 2022
Using nppiResizeBatch_8u_C3R causes exception wrap illegal address GPU-Accelerated Libraries npp	3	806	August 24, 2022
NVENC: Realtime encoding using ID3D11Texture2D as input? GPU-Accelerated Libraries	12	7888	April 13, 2018
Encoding NV12 by nvJPEG GPU-Accelerated Libraries cuda , nvjpeg	3	2848	June 7, 2024
Issue with nppiYUV420ToRGB_8u_P3AC4R CUDA Programming and Performance	2	941	May 9, 2016

Convert YUV NV12 to RGB24 packed, CUDA_ERROR_ILLEGAL_ADDRESS

Related topics