Hi,
I’m trying to get decoded video frames back into RAM, therefore I’m basically building ontop of the source code of the NvDecodeGL sample. I’m successfully decoding the frames and the Y channel looks fine. As soon as I try to get color however, I fail, getting a lot of glitches. I’m not sure how the YUV data layout looks like. Is there documentation on this? The chroma format is YUV420 in my case, with progressive frames, so what I tried is this:
inline void yuv2rgb( unsigned char y, unsigned char u, unsigned char v, unsigned char &r, unsigned char &g, unsigned char &b )
{
r = clamp<int>( y + 1.403f * v, 0, 255 );
g = clamp<int>( y - 0.344f * u - 1.403 * v, 0, 255 );
b = clamp<int>( y + 1.770f * u, 0, 255 );
}
//...
const unsigned char *y = src;
const unsigned char *u = src + ( srcStride * height );
const unsigned char *v = u + ( srcStride * height ) / 4;
for( int j = 0; j < height; j++ )
{
for( int i = 0; i < width; i++ )
{
yuv2rgb( y[i], u[i/2], v[i/2], dst[0], dst[1], dst[2] );
dst += 3;
}
if( j & 0x01 )
{
u += srcStride / 2;
v += srcStride / 2;
}
y += srcStride;
dst += ( dstStride - width * 3 );
}
where src is the host memory pointer where data is read to with cuMemcpyDtoHAsync (cp. frameYUV in the GL sample) and srcStride is the pitch returned by cuvidMapVideoFrame. dstWidth and dstHeight simply are the texture dimensions (thus video target size, not coded size). I tried around a bit, but I keep failing. What I get are interlaced effects since every other row is shifted, so I’m guessing my assumptions on data layout are wrong. Could anybody shed some light on this?
Also, being a novice at CUDA programming, I’m wondernig how you guarantee that data copying is finished by the time it’s used, after all it seems you’re using an async copy operation in the sample with cuMemcpyDtoHAsync.