Nvivafilter: different input and output buffers

I’m developing a filter (.so library) to be used with gstreamer element nvivafilter

The official example, nvsample_cudaprocess.cu does some changes over the incoming frame data. That is, only one frame and set of buffers exists, it is used as input data and to store output image. Overwrite on the data buffers is done during the processing of the image.

However, in my case, I can not overwrite the input while processing, I need different data storage for input and output.

I think I must follow one of these two possibilities:

alternative A)

  1. allocate new buffers (with the necessary structure as EGLImageKHR or similar) to store the outgoing frame
  2. process the incoming image writing the results on the buffers allocated in previous step
  3. do something to swap the current input frame and the one with the output frame.

But I do not know how to do the steps 1) and 3). No example found after one week of googling. No similar case found in remainder plugins for gstreamer that source is available.

alternative b) (worst)

  1. clone the data of the input frame (allocate and copy all data) to a new set of buffer/s.
  2. process the image using as input the cloned buffers and saving result in the original input frame.

I didn’t like this solution because needs a full copy of all the input data.

Any hint ?

Beside using nvivafilter plugin, you can try to add prob callback and access the buffers in the callback. Please take a look at the sample:

The script is updated to OpenCV 4.5 installation:

Thanks for your support.

The provided example is a different approach, with a programmatically construction of the pipeline. If possible, I would like to use nvivafilter (or equivalent gstreamer element) to launch pipeline from command line (gst-launch-1.0 …).

As you can see in this link

some other users are also interested in same issue.

The implementation of nvivafilter is similar to nvvidconv plugin, and we have nvvidconv open source in r32.5. You may take a look at the source code and do customization. The public code is in
L4T Driver Package (BSP) Sources

The default flow is

  1. Create NvBuffer at sink pad
  2. Create NvBuffer at source pad
  3. Call NVBufferTransform()

You can get EGLImage by calling NvEGLImageFromFd() and apply your code in step 3.

I’m following previous suggestion. Based on gstnvconv I’ve wrote a new plugin nvvidtrans with the video processing in CUDA. The main method is this one (error handling suppressed to improve legibility):

EGLDisplay eglDisplay = eglGetDisplay( EGL_DEFAULT_DISPLAY );
EGLint major, minor;
eglInitialize( eglDisplay, &major, &minor );

EGLImageKHR input_image = NvEGLImageFromFd( eglDisplay, input_dmabuf_fd );
EGLImageKHR output_image = NvEGLImageFromFd( eglDisplay, output_dmabuf_fd );


CUgraphicsResource pInputResource = NULL;
status = cuGraphicsEGLRegisterImage(&pInputResource, input_image, CU_GRAPHICS_MAP_RESOURCE_FLAGS_READ_ONLY);

CUeglFrame eglInputFrame;
status = cuGraphicsResourceGetMappedEglFrame( &eglInputFrame, pInputResource, 0, 0);

CUgraphicsResource pOutputResource = NULL;
status = cuGraphicsEGLRegisterImage(&pOutputResource, output_image, CU_GRAPHICS_MAP_RESOURCE_FLAGS_NONE);

CUeglFrame eglOutputFrame;
status = cuGraphicsResourceGetMappedEglFrame( &eglOutputFrame, pOutputResource, 0, 0);

printf( “width: %d %d\n”, eglInputFrame.width, eglOutputFrame.width );
printf( “planeCount: %d %d\n”, eglInputFrame.planeCount, eglOutputFrame.planeCount );
printf( “height: %d %d\n”, eglInputFrame.height, eglOutputFrame.height );
printf( “frame type: %s %s\n”,
eglInputFrame.frameType == CU_EGL_FRAME_TYPE_ARRAY ? “array” : “pitch”,
eglOutputFrame.frameType == CU_EGL_FRAME_TYPE_ARRAY ? “array” : “pitch” );
printf( “numChannels: %d %d\n”, eglInputFrame.numChannels, eglOutputFrame.numChannels );
printf( “pitch: %d %d\n”, eglInputFrame.pitch, eglOutputFrame.pitch );

status = cuCtxSynchronize();

cuda_mirror( (CUdeviceptr) eglInputFrame.frame.pArray[0], eglInputFrame.width, (CUdeviceptr) eglOutputFrame.frame.pArray[0], eglOutputFrame.width);

status = cuCtxSynchronize();

where cuda_mirror is the call to the kernel:

static int cuda_mirror(
CUdeviceptr pIn, int inPitch,
CUdeviceptr pOut, int outPitch){
dim3 threadsPerBlock(BOX_W, BOX_H);
dim3 blocks(40,40);

cudaMirrorKernel<<<blocks,threadsPerBlock>>>((char*)pIn, inPitch, (char*)pOut, outPitch);

return 0;


However, an error appears in call to last cuCtxSynchronize, due probably to an error during execution of the kernel.

The print statements on this code shows the following images characteristics for both, input and output images:

memory type: BUF_MEM_HW
image width: 1280
image height: 720
image planeCount: 2
image frame type: array
image numChannels: 1
image pitch: 0

I do not known the origin of this error. I suspect it could be related to frame_type of type CuArray instead of Pitch.

The command line used to launch the pipeline is:

gst-launch-1.0 -e filesrc location=test.mp4 ! qtdemux ! h264parse ! omxh264dec ! nvvidtrans ! 'video/x-raw(memory:NVMM), format=(string)NV12' ! nvoverlaysink

Any hint is welcome

Thanks a lot.

We have deprecated omx plugins, so please use nvv4l2decoder. And please make sure the NvBuffer is in pitch linear.

How can I request nvv4l2decoder to produce pitch linear ? Or I must insert a " nvvidconv bl-output=false" in the pipeline to pass from block to pitch ?

You can set capability to

... ! nvvidconv ! 'video/x-raw(memory:NVMM),format=I420' ! ...


... ! nvvidconv bl-output=0 ! 'video/x-raw(memory:NVMM),format=NV12' ! ...

Ok, thanks a lot.