Mixing two videos using a GLSL shader into a h264 stream. Is it possible?

Hi Guys,

I am working on a project where I need frames from two (4k) H265 videos mixed into an output image (720p) and have that encoded into an H264 stream. I already have the “mixing” recipe in an existing GLSL shader.

I have downloaded the Nvidia Video Codec SDK (https://developer.nvidia.com/nvidia-video-codec-sdk/download) and I have played with the examples, so decoding and encoding of frames works.

As the input video frames are rather large I would like avoid too many GPU-GPU transfers.

My question is therefore is it feasible to have the video decoder and encoder interoperate with opengl in a nice way or would I be forced to implement my GLSL/mixing function as a CUDA kernel to achieve something like this?

Any references to projects I can learn from would be highly appreciated.

Kind regards

Jesper

Check also other examples like:

./Samples/AppEncode/AppEncGL
./Samples/AppDecode/AppDecGL

If you need some trivial 2D-image operation you should also check NVIDIA NPP with prepared CUDA kernels.
Some snippet from my code (transcoding, watermark and subtitles… video example).

...
// get decoded image
CUdeviceptr dSrcFrame;
unsigned int pSrcFrame;
CUDA_API_CALL(cuvidMapVideoFrame(session->hDecoder, pDispInfo->picture_index, &dSrcFrame, &pSrcFrame, &videoProcessingParameters));
...
// YUV->RGBX
Npp8u *sSrc[2] = {(Npp8u *)dSrcFrame, (Npp8u *)(dSrcFrame + session->encoderHeight * pSrcFrame)};
NppiSize oSizeROI = {session->encoderWidth, session->encoderHeight};
NPP_API_CALL(nppiNV12ToRGB_709HDTV_8u_P2C3R((const Npp8u **)sSrc, pSrcFrame, (Npp8u *)session->dRGB, session->pRGB, oSizeROI));
...
// RGBX->ARGB
int order[] = {2, 1, 0, 3};
NPP_API_CALL(nppiSwapChannels_8u_C3C4R((Npp8u *)session->dRGB, session->pRGB, (Npp8u *)session->dARGB, session->pARGB, oSizeROI, order, 255));
...
// resize
NppiSize oSizeROITL = {session->ovrTLWidth, session->ovrTLHeight};
NppiRect rInTL = {0, 0, session->ovrTLWidth, session->ovrTLHeight};
NppiRect rOutTL = {0, 0, session->ovrTLWidth * session->ovrTLResize, session->ovrTLHeight * session->ovrTLResize};
NPP_API_CALL(nppiResize_8u_C4R((Npp8u *)session->dovrTL, session->povrTL, oSizeROITL, rInTL, (Npp8u *)session->dResize, session->pResize, oSizeROI, rOutTL, NPPI_INTER_SUPER));
// alpha change
NppiSize oSizeROITLresized = {session->ovrTLWidth * session->ovrTLResize, session->ovrTLHeight * session->ovrTLResize};
Npp8u mulTL[4] = {255, 255, 255, (int)(255 * dynamic_alpha)};
NPP_API_CALL(nppiMulC_8u_C4IRSfs(mulTL, (Npp8u *)session->dResize, session->pResize, oSizeROITLresized, 8));
// alpha blend
NPP_API_CALL(nppiAlphaComp_8u_AC4R((Npp8u *)session->dResize, session->pResize, (Npp8u *)session->dARGB, session->pARGB, (Npp8u *)pEncodeBuffer->hInputSurface, pEncodeBuffer->pitchInputSurface, oSizeROI, NPPI_OP_ALPHA_OVER));
// and send pEncodeBuffer to encoder
...