Modify argus/samples/gstVideoEncode to receive RGBA frames from CUDA

BareMetalCoder · January 22, 2020, 5:35am

Hi.

I am currently learning how to encode arbitrary frames from CUDA using hte gstVideoEncode example.

a) I have created an EGL stream
b) I have replaced the camera-specific code in gstVideoEncode with a CUDA producer that generates RGBA images.
c) I convert my RGBA image into two NV12 planes using various NPP calls
d) then I try to set up a cudaEglFrame to transmit via cudaEglProducerPresentFrame.

While the luma channel seems to encode correctly, I seem to have issues with the chroma portion of the NV12 planes. The color seems off.

Specifically, I am wondering how to set up the plane descriptors to properly describe NV12. I am using Pitched Pointers.

The code below describes how I am setting up my eglFrame descriptor. Am I doing it correctly? I’m not really sure how to describe the UV/CbCr portion correctly.

cudaEglFrame f;
f.frameType = cudaEglFrameTypePitch;
f.eglColorFormat = cudaEglColorFormatYUV420SemiPlanar;
f.planeCount = 2;

cudaPitchedPtr luma;
luma.xsize = m_width;
luma.ysize = m_height;
cudaMallocPitch(
	&luma.ptr,
	&luma.pitch,
	luma.xsize,
	luma.ysize);

f.frame.pPitch[0] = luma;
f.planeDesc[0].depth = 1;
f.planeDesc[0].height = luma.ysize;
f.planeDesc[0].width = luma.xsize;
f.planeDesc[0].numChannels = 1;
f.planeDesc[0].pitch = luma.pitch;
f.planeDesc[0].channelDesc.x = 8;
f.planeDesc[0].channelDesc.y = 0;
f.planeDesc[0].channelDesc.z = 0;
f.planeDesc[0].channelDesc.w = 0;
f.planeDesc[0].channelDesc.f = cudaChannelFormatKindUnsigned;
f.planeDesc[0].reserved[0] = 0;
f.planeDesc[0].reserved[1] = 0;
f.planeDesc[0].reserved[2] = 0;
f.planeDesc[0].reserved[3] = 0;

//CBCR
cudaPitchedPtr chroma;
chroma.xsize = m_width / 2;
chroma.ysize = m_height / 2;
cudaMallocPitch(
	&chroma.ptr,
	&chroma.pitch,
	chroma.xsize * 2, // x2 because each pixel has one Cb and one Cr.
	chroma.ysize);

f.frame.pPitch[1] = chroma;
f.planeDesc[1].depth = 1;
f.planeDesc[1].height = chroma.ysize;
f.planeDesc[1].width = chroma.xsize;
f.planeDesc[1].numChannels = 1;
f.planeDesc[1].pitch = chroma.pitch;
f.planeDesc[1].channelDesc.x = 8;
f.planeDesc[1].channelDesc.y = 8;
f.planeDesc[1].channelDesc.z = 0;
f.planeDesc[1].channelDesc.w = 0;
f.planeDesc[1].channelDesc.f = cudaChannelFormatKindUnsigned;
f.planeDesc[1].reserved[0] = 0;
f.planeDesc[1].reserved[1] = 0;
f.planeDesc[1].reserved[2] = 0;
f.planeDesc[1].reserved[3] = 0;

Thanks!

BareMetalCoder · January 22, 2020, 6:23am

After further investigation, I have noticed that cudaEglReturnFrame returns a timeout error (CUDA error 6). I’m wondering if I’m doing something wrong.

Does the encoder need to hold on to a number of frames before it starts releasing them back to the producer? How many EGLFrames should I create?

Thanks.

DaneLLL · January 22, 2020, 8:53am

Hi,
There is a sample for r28.2 release:
https://devtalk.nvidia.com/default/topic/1044444/jetson-tx1/problem-with-nveglstreamsrc/post/5300639/#5300639

Please refer to it and check if you can apply it to r32 release.

BareMetalCoder · January 22, 2020, 7:58pm

I ended up creating the following boilerplate plane descriptions for the Y and CbCr planes:

int lumaWidth = m_width;
int lumaHeight = m_height;
int chromaWidth = lumaWidth / 2;
int chromaHeight = lumaHeight / 2;

cudaEglPlaneDesc lumaPlaneDesc;
lumaPlaneDesc.depth = 1;
lumaPlaneDesc.height = lumaHeight;
lumaPlaneDesc.width = lumaWidth;
lumaPlaneDesc.numChannels = 1;
lumaPlaneDesc.pitch = 0; // patch later in my code using the pitch returned from cudaMallocPitch
lumaPlaneDesc.channelDesc.x = 8;
lumaPlaneDesc.channelDesc.y = 0;
lumaPlaneDesc.channelDesc.z = 0;
lumaPlaneDesc.channelDesc.w = 0;
lumaPlaneDesc.channelDesc.f = cudaChannelFormatKindUnsigned;
lumaPlaneDesc.reserved[0] = 0;
lumaPlaneDesc.reserved[1] = 0;
lumaPlaneDesc.reserved[2] = 0;
lumaPlaneDesc.reserved[3] = 0;

cudaEglPlaneDesc chromaPlaneDesc;
chromaPlaneDesc.depth = 1;
chromaPlaneDesc.height = chromaHeight;
chromaPlaneDesc.width = chromaWidth;
chromaPlaneDesc.numChannels = 1;
chromaPlaneDesc.pitch = 0; // patch later in my code using the pitch returned from cudaMallocPitch
chromaPlaneDesc.channelDesc.x = 8;
chromaPlaneDesc.channelDesc.y = 8;
chromaPlaneDesc.channelDesc.z = 0;
chromaPlaneDesc.channelDesc.w = 0;
chromaPlaneDesc.channelDesc.f = cudaChannelFormatKindUnsigned;
chromaPlaneDesc.reserved[0] = 0;
chromaPlaneDesc.reserved[1] = 0;
chromaPlaneDesc.reserved[2] = 0;
chromaPlaneDesc.reserved[3] = 0;

The pitch values above are not set, because I only know the pitch after calling cudaMallocPitch to create a few buffers:

constexpr int frameCount = 10;
cudaEglFrame frames[frameCount];
for (int i = 0; i < frameCount; i++) {
	cudaPitchedPtr luma;
	cudaPitchedPtr chroma;

	// Y
	luma.xsize = lumaWidth;
	luma.ysize = lumaHeight;
	cudaMallocPitch(
		&luma.ptr,
		&luma.pitch,
		luma.xsize,
		luma.ysize);

	//CBCR
	chroma.xsize = chromaWidth;
	chroma.ysize = chromaHeight;
	cudaMallocPitch(
		&chroma.ptr,
		&chroma.pitch,
		chroma.xsize * 2, // CbCr plane so 2 bytes per pixel
		chroma.ysize);

	frames[i].frameType = cudaEglFrameTypePitch;
	// Note: _ER (Extended Range) color formats don't seem to be supported:
	frames[i].eglColorFormat = cudaEglColorFormatYUV420SemiPlanar;
	frames[i].planeCount = 2;
	frames[i].frame.pPitch[0] = luma;
	frames[i].planeDesc[0] = lumaPlaneDesc;
	frames[i].frame.pPitch[1] = chroma;
	frames[i].planeDesc[1] = chromaPlaneDesc;

	// Update the missing pitch values.
	frames[i].planeDesc[0].pitch = luma.pitch;
	frames[i].planeDesc[1].pitch = chroma.pitch;
}

Interestingly, the problem that was messing my colors up was that when I recover the eglFrame from cudaEglProducerReturnFrame, the EGL frame’s chroma pitch has changed. The consumer appears to divide it by 2, and I have to manually double it again to the value returned by cudaMallocPitch() for it to work properly.

I thought the pitch was supposed to be in bytes, but the value returned from the consumer appears to be the number of CbCr pairs that fit in the byte pitch. Is this supposed to be correct?

Is it normal for a consumer to modify the eglFrame that is given to it?

DaneLLL · February 13, 2020, 8:59am

Hi,
https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html
Frame size of cudaEglColorFormatYUV420SemiPlanar is WidthxHeightx1.5 bytes. Y plane surface has WidthxHeight bytes and UV surface has (Width/2)*(Height/2)*2 bytes.

BareMetalCoder · February 13, 2020, 2:54pm

Thank you Dane, that matches my observations and I believe it confirms that the code I posted above is correct, unless I am missing something.

However, my last question pertains to behaviour that seems odd to me: when I call cudaEglProducerReturnFrame, the UV plane’s pitch in the returned cudaEglFrame comes back as half of the value I put in.

ex: 2000x1000 image.

Y channel is 2000 pixels wide
UV channel is 1000 pixels wide, but 2 bytes per pixel so 2000 bytes.
cudaMallocPitch takes a width in bytes;

Y channel: cudaMallocPitch(&ptr, &yPitch, 2000, 1000); I get a yPitch of 2048
UV channel: cudaMallocPitch(&ptr, &uvPitch, 2000, 500); I get a uvPitch of 2048

I populate the the cudaEglFrame with the appropriate values and set eglFrame.planeDesc[1].pitch to 2048

I push the image into the EGL Stream using cudaEglProducerPresentFrame. The image is displayed correctly.

However, when I recover the cudaEglFrame using cudaEglProducerReturnFrame, I observe that eglFrame.planeDesc[1].pitch has been changed to 1024, which is wrong.

If I try reusing the eglFrame as-is without correcting the changed pitch back to 2048, the image is rendered incorrectly.

Is it normal for the cudaEglFrame’s values to be modified by the downstream blocks? Is the caller responsible for re-setting this planeDesc information each time a buffer is set?

DaneLLL · February 14, 2020, 1:43am

Hi,
We would suggest fill in CUeglFrame in each cuEGLStreamProducerPresentFrame() call. The producer should ensure CUeglFrame of every frame is correct.

BareMetalCoder · February 18, 2020, 3:32pm

That seems reasonable - I’m guessing that in my case, the “UV” buffer is re-used as a single-channel cb or cr channel downstream in some sort of I420 conversion (it’s more than big enough). It would explain the smaller pitch.

That being said, I’m having a very hard time finding documentation about the purpose and correct use of cudaEglProducerReturnFrame – the existing examples (Tegra and CUDA) don’t quite tell me enough, and the CUDA interop documentation ( CUDA Runtime API :: CUDA Toolkit Documentation ) is very vague: “Return the CUDA eglFrame to the EGLStream last released by the consumer.”. It would probably help if that section was accompanied by a data flow diagram explaining what the Producer and Consumer responsibilities are.

cudaEglProducerReturnFrame takes a cudaEglFrame pointer and populates it. This must be for a reason. However, if I understand correctly, you are saying that some or all of the information is expected to be reliable. My understanding was that it was supposed to provide information about a buffer that had been released by the Consumer and that was thus safe to use again. Otherwise I don’t really see the point of this function.

Do people normally keep a copy of a cudaEglFrame “template” around to repopulate the fields that might have changed?
Are there any values that are still considered safe? ex: the pointer in eglFrame.frame.pPitch.ptr?

I’m asking this because I am imagining a scenario where a consumer might not want to return the buffers in order (there are reasonable examples of this, I’m sure). This would prevent us from using buffers in a blindly circular way. Seems to me that we need the information returned from cudaEglProducerReturnFrame to know which buffer is safe to use, no?