Opencv gpu mat into GStreamer without downloading to cpu

Hello,

I have this livestreaming project and got very good results. i can reach about 30 fps with a 1080p rtp stream! Thats already very good. But i feel like i am leaving lots of performance on the table by copying frames to the cpu.

I am using a Ximea pcie camera that can write into the jetson tx2 gpu space directly with the zerocopy feature.

Basically i am getting a cv::cuda::GpuMat as input into openCV. Thats perfect because i can use commands such as cv::cuda::demosaicing to do some processing on the image. All in fast gpu space.

Now i want to get the frame out of openCV and into a gstreamer pipeline. But for this to work i am using cv::VideoWriter. Which is very unfortunate as it only supports normal cpu matrices.

So i am getting a gpumat from the cam. do some work. download it into a cpumat. then upload again to gpu mat to encode it with the nvv4l2h264enc.

Isn’t there a way to skip the cpu mat? Somehow i need to bypass the cpu videoWriter. Maybe nvivafilter would help?

here is the code i am using. i needed to remove some lines but the code lines that deal with frameflow are as follows:

getting the frame from the cam:

// Define pointer used for data on GPU
void *imageGpu;

//openCV matrix in cpu space
cv::UMat cpuFrame;

//openCV matrix in gpu space
cv::cuda::GpuMat gpuFrame(height, width, CV_16UC3);

// get the image from the camera
xiGetImage(xiH, 5000, &image);      // Get host-pointer to image data
cudaHostGetDevicePointer(&imageGpu, image.bp, 0);      // Convert to device pointer
cv::cuda::GpuMat gpu_mat_raw(height, width, CV_16UC1, imageGpu);       // Create GpuMat from the device pointer
cv::cuda::demosaicing(gpu_mat_raw, gpuFrame, cv::COLOR_BayerBG2BGR);       // Demosaic raw bayer image to color image
cv::cuda::multiply(gpuFrame, cv::Scalar(WB_BLUE, WB_GREEN, WB_RED), gpuFrame);       // Apply static white balance by multiplying the channels

at this point we have our frame in openCV. Debayered and white balanced. we can do more effects etc on the frames. When we are done i do the following:

// download (copy) to cpu mat
cv::UMat cpuFrame;
gpuFrame.download(cpuFrame);
pipe.write(cpuFrame);

we download to CPU space just to be able to use this video writer pipeline. it encodes the frames on the gpu and sends it to a udp sink where a rtsp server picks it up to make the stream:

pipe = cv::VideoWriter("appsrc ! video/x-raw, format=BGR ! queue ! videoconvert ! video/x-raw, format=BGRx ! nvvidconv \
! nvv4l2h264enc maxperf-enable=1 \
! rtph264pay pt=96 config-interval=1 ! application/x-rtp, media=video, encoding-name=H264 , profile=main\
! udpsink host=localhost port=5000", 0, 30, cv::Size (w, h));

as you can see we are using the nvv4l2h264enc. Meaning that we only move the frame out of the gpu to put it back in.

If i try to use videoWriter with a gpuFrame it doesn’t work. Apparently its not a feature and only works with the gpuFrame.

My Question is if its possible to bypass the video Writer somehow? I was reading something about nvivafilter but my basic understanding of gstreamer prevented me from trying it out.

so the workflow looks like this:

camera → gpu → opencv → cpu → opencv → gpu → encode → upd → stream

and i want it to be

camera → gpu → opencv → encode → upd → stream

i hope i picked the right forum category. there are so many topics that overlap in my request. It was hard to pick the right one!

Hi,
Please refer to this sample:
Nano not using GPU with gstreamer/python. Slow FPS, dropped frames - #8 by DaneLLL

You can map NvBuffer to cv::gpu::gpuMat for doing processing. Please take a look at the sample and see if you can apply it to your usecase.

i did follow the thread and needed to do some changes to the code for it to run.

however i do not have any nvargus camera to test. will have to do some tweaking of my code until i can reply if it worked or not.

Tbh i don’t see how i can use this example.

As far as i understand it the pipe starts with nvargus already outputting into the gstpipe.
But in my case i have to get the input via appsrc/gpu mat.

I think i would have to use a GST BUFFER and somehow write/point my frame to it. But i would have to redo this every frame? How do i do that if the gst pipe in the example is running in another thread? i am lost!

For your case you may try appsrc_nvmm sample by DaneLLL.

Hello @Honey_Patouceul !

Always happy to read your name!

I looked at the code you referenced but having trouble understanding where it get its input from.

there is the function

static gboolean feed_function(gpointer d)

which needs a gpointer called ‘d’. However its never used in the function body. In the main they call the function with a nullpointer like so:

    for (int i=0; i<150; i++) {
        feed_function(nullptr);
        usleep(33333);
    }

So it seems this argument is not needed and could be removed?
if so how do they get their frame into the function?

i successfully compiled the program but i can’t run it since i am running th ejetson without any screen attached to it. The output looks like this:

$ ./appsrc_nvmm 
nvbuf_utils: Could not get EGL display connection
Using launch string: appsrc name=mysource ! video/x-raw(memory:NVMM),width=1920,height=1080,framerate=30/1,format=RGBA ! nvvidconv ! video/x-raw(memory:NVMM),format=NV12 ! nvoverlaysink 
NvEGLImageFromFd: No EGLDisplay to create EGLImage
cuGraphicsEGLRegisterImage failed: 999 
NvDestroyEGLImage: No EGLDisplay to destroy EGLImage
NvEGLImageFromFd: No EGLDisplay to create EGLImage
cuGraphicsEGLRegisterImage failed: 999 
NvDestroyEGLImage: No EGLDisplay to destroy EGLImage
NvEGLImageFromFd: No EGLDisplay to create EGLImage
cuGraphicsEGLRegisterImage failed: 999 
...

I am not completely sure what a EGLDisplay is. At first i thought it would be a virtual display but the error suggests a real display. I have no display connected to the jetson. I am only using ssh to work with it.

Inside the feed_function there is this line:

        egl_image = NvEGLImageFromFd(egl_display, dmabuf_fd);

since i couldn’t figure out what they used the input argument ‘gpointer d’ for. I have to assume this is where they input the data. but i am not using any egldisplay. I only have a openCV gpu mat and a gpu memory pointer to it. Also i don’t need to do any openCV frame operations anymore. i only need to get a gpu mat into the gstreamer pipeline.

its very hard for me to see how i can modify this code to not use any egl display. Being a beginner in GST i thought this must be more intuitive than it seems to be.

Thank you all for taking your time reading and helping me.

I’d suggest to use a local display for now. You may use some virtual display, but for virtual GL I have no experience, someone else would further advice.

Once you can run this example (note this would just display 150 void frames), you would just add your code reading frame into Unified or pinned memory from xiGetImage() and further processing into feed_function() just after comment // CUDA code here.
Adjust static width and height, and decrease framerate in the pipeline string in case your operations exceed the frame interval. Better start with 640x480@15fps and later increase.

Just tried now with R32.6 on AGX Xavier with a fresh opencv master build. Check these for further trying:

appsrc_NVMM.cpp
#include <cstdlib>
#include <gst/gst.h>
#include <gst/gstinfo.h>
#include <gst/app/gstappsrc.h>
#include <glib-unix.h>
#include <dlfcn.h>

#include <cstring>
#include <iostream>
#include <sstream>
#include <thread>

#include "nvbuf_utils.h"
#include <cuda.h>
#include <cuda_runtime.h>
#include <cudaEGL.h>

#include <opencv2/core.hpp>
#include <opencv2/core/cuda.hpp>


using namespace std;

#define USE(x) ((void)(x))

static GstPipeline *gst_pipeline = nullptr;
static string launch_string;
static GstElement *appsrc_;

GstClockTime timestamp = 0;
static int w = 1920;
static int h = 1080;
EGLDisplay egl_display;

static void
notify_to_destroy (gpointer user_data)
{
    GST_INFO ("NvBufferDestroy(%d)", *(int *)user_data);
    NvBufferDestroy(*(int *)user_data);
    g_free(user_data);
}


static gboolean feed_function(gpointer d) {
    GstBuffer *buffer;
    GstFlowReturn ret;
    GstMapInfo map = {0};
    int dmabuf_fd = 0;
    gpointer data = NULL, user_data = NULL;
    NvBufferParams par;
    GstMemoryFlags flags = (GstMemoryFlags)0;
   
    NvBufferCreate(&dmabuf_fd, w, h, NvBufferLayout_Pitch, NvBufferColorFormat_ABGR32);
    //CUDA process
    {
        EGLImageKHR egl_image;
        egl_image = NvEGLImageFromFd(egl_display, dmabuf_fd);
        CUresult status;
        CUeglFrame eglFrame;
        CUgraphicsResource pResource = NULL;
        cudaFree(0);
        status = cuGraphicsEGLRegisterImage(&pResource,
                    egl_image,
                    CU_GRAPHICS_MAP_RESOURCE_FLAGS_NONE);
        if (status != CUDA_SUCCESS)
        {
            printf("cuGraphicsEGLRegisterImage failed: %d \n",status);
        }
        status = cuGraphicsResourceGetMappedEglFrame(&eglFrame, pResource, 0, 0);
        status = cuCtxSynchronize();

        // CUDA code here
        cv::cuda::GpuMat dmat(h,w,CV_8UC4,eglFrame.frame.pPitch[0]);
        // R,G,B,A
        dmat.setTo(cv::Scalar(255,0,0,255));

        status = cuCtxSynchronize();
        status = cuGraphicsUnregisterResource(pResource);
        NvDestroyEGLImage(egl_display, egl_image);
    }
    user_data = g_malloc(sizeof(int));
    GST_INFO ("NvBufferCreate %d", dmabuf_fd);
    *(int *)user_data = dmabuf_fd;
    NvBufferGetParams (dmabuf_fd, &par);
    data = g_malloc(par.nv_buffer_size);

    buffer = gst_buffer_new_wrapped_full(flags,
                                         data,
                                         par.nv_buffer_size,
                                         0,
                                         par.nv_buffer_size,
                                         user_data,
                                         notify_to_destroy);
    buffer->pts = timestamp;

    gst_buffer_map (buffer, &map, GST_MAP_WRITE);
    memcpy(map.data, par.nv_buffer , par.nv_buffer_size);
    gst_buffer_unmap(buffer, &map);

    g_signal_emit_by_name (appsrc_, "push-buffer", buffer, &ret);
    gst_buffer_unref(buffer);

    timestamp += 33333;
    return G_SOURCE_CONTINUE;
}

int main(int argc, char** argv) {
    USE(argc);
    USE(argv);

    gst_init (&argc, &argv);

    GMainLoop *main_loop;
    main_loop = g_main_loop_new (NULL, FALSE);
    ostringstream launch_stream;

    egl_display = eglGetDisplay(EGL_DEFAULT_DISPLAY);
    eglInitialize(egl_display, NULL, NULL);
    launch_stream
    << "appsrc name=mysource ! "
    << "video/x-raw(memory:NVMM),width="<< w <<",height="<< h <<",framerate=30/1,format=RGBA ! "
    << "nvvidconv ! video/x-raw(memory:NVMM),format=NV12 ! "
    << "nvoverlaysink ";

    launch_string = launch_stream.str();

    g_print("Using launch string: %s\n", launch_string.c_str());

    GError *error = nullptr;
    gst_pipeline  = (GstPipeline*) gst_parse_launch(launch_string.c_str(), &error);

    if (gst_pipeline == nullptr) {
        g_print( "Failed to parse launch: %s\n", error->message);
        return -1;
    }
    if(error) g_error_free(error);

    appsrc_ = gst_bin_get_by_name(GST_BIN(gst_pipeline), "mysource");
    gst_app_src_set_stream_type(GST_APP_SRC(appsrc_), GST_APP_STREAM_TYPE_STREAM);

    gst_element_set_state((GstElement*)gst_pipeline, GST_STATE_PLAYING); 

    for (int i=0; i<150; i++) {
        feed_function(nullptr);
        usleep(33333);
    }

    gst_element_set_state((GstElement*)gst_pipeline, GST_STATE_NULL);
    gst_object_unref(GST_OBJECT(gst_pipeline));
    g_main_loop_unref(main_loop);
    eglTerminate(egl_display);

    g_print("going to exit \n");
    return 0;
}

[EDIT: the timestamp increment is wrong. Change it to timestamp += 33333333;]

Makefile
################################################################################
# Copyright (c) 2019, NVIDIA CORPORATION.  All rights reserved.
#
# Permission is hereby granted, free of charge, to any person obtaining a
# copy of this software and associated documentation files (the "Software"),
# to deal in the Software without restriction, including without limitation
# the rights to use, copy, modify, merge, publish, distribute, sublicense,
# and/or sell copies of the Software, and to permit persons to whom the
# Software is furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in
# all copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
# THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
# FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
# DEALINGS IN THE SOFTWARE.
#################################################################################

APP:= appsrc_nvmm
CUDA_VER?=10.2
OPENCV_VERSION=master

ifeq ($(CUDA_VER),)
  $(error "CUDA_VER is not set")
endif
CXX:= g++
SRCS:= appsrc_nvmm.cpp

CFLAGS:= -I/usr/src/jetson_multimedia_api/include \
	-I/usr/local/cuda-$(CUDA_VER)/include \
	-I/usr/local/opencv-$(OPENCV_VERSION)/include/opencv4

LIBS:= -Wall -std=c++11 \
	-L/usr/lib/aarch64-linux-gnu/tegra/ -lEGL -lGLESv2 \
	-L/usr/lib/aarch64-linux-gnu/tegra/ -lcuda -lnvbuf_utils \
	-L/usr/local/cuda-$(CUDA_VER)/lib64/ -lcudart \
	-L/usr/local/opencv-$(OPENCV_VERSION)/lib -lopencv_core

OBJS:= $(SRCS:.cpp=.o)

PKGS:= gstreamer-app-1.0
CFLAGS+= `pkg-config --cflags $(PKGS))`
LIBS+= `pkg-config --libs $(PKGS)`

all: $(APP)

%.o: %.cpp
	@echo "Compiling: $<"
	$(CXX) $(CFLAGS) -c $< -o $@

$(APP): $(OBJS)
	@echo "Linking: $@"
	$(CXX) -o $@ $(OBJS) $(CFLAGS) $(LIBS)

clean:
	rm -rf $(OBJS) $(APP)

and run it after (you would adjust Makefile and the following to your opencv version path):

export LD_LIBRARY_PATH=/usr/local/opencv-master/lib:$LD_LIBRARY_PATH
make

# This should make a red 1080p display
./appsrc_nvmm 

Hi. I’m trying to use this snippet but I believe that it doesn’t work after the first frame, can double check that?
I did few modification:

#include <cstdlib>
#include <gst/gst.h>
#include <gst/gstinfo.h>
#include <gst/app/gstappsrc.h>
#include <glib-unix.h>
#include <dlfcn.h>

#include <cstring>
#include <iostream>
#include <sstream>
#include <thread>

#include "nvbuf_utils.h"
#include <cuda.h>
#include <cuda_runtime.h>
#include <cudaEGL.h>

#include <opencv2/core.hpp>
#include <opencv2/core/cuda.hpp>
#include <opencv2/highgui.hpp>

using namespace std;

#define USE(x) ((void)(x))

static GstPipeline *gst_pipeline = nullptr;
static string launch_string;
static GstElement *appsrc_;

GstClockTime timestamp = 0;
static int w = 1920;
static int h = 1080;
EGLDisplay egl_display;

static int num_frames=0;

static void
notify_to_destroy (gpointer user_data)
{
    GST_INFO ("NvBufferDestroy(%d)", *(int *)user_data);
    NvBufferDestroy(*(int *)user_data);
    g_free(user_data);
}


static gboolean feed_function(gpointer d) {
    GstBuffer *buffer;
    GstFlowReturn ret;
    GstMapInfo map = {0};
    int dmabuf_fd = 0;
    gpointer data = NULL, user_data = NULL;
    NvBufferParams par;
    GstMemoryFlags flags = (GstMemoryFlags)0;
   
    NvBufferCreate(&dmabuf_fd, w, h, NvBufferLayout_Pitch, NvBufferColorFormat_ABGR32);
    //CUDA process
    {
        EGLImageKHR egl_image;
        egl_image = NvEGLImageFromFd(egl_display, dmabuf_fd);
        CUresult status;
        CUeglFrame eglFrame;
        CUgraphicsResource pResource = NULL;
        cudaFree(0);
        status = cuGraphicsEGLRegisterImage(&pResource,
                    egl_image,
                    CU_GRAPHICS_MAP_RESOURCE_FLAGS_NONE);
        if (status != CUDA_SUCCESS)
        {
            printf("cuGraphicsEGLRegisterImage failed: %d \n",status);
        }
        status = cuGraphicsResourceGetMappedEglFrame(&eglFrame, pResource, 0, 0);
        status = cuCtxSynchronize();

        // CUDA code here
        cv::cuda::GpuMat dmat(h,w,CV_8UC4,eglFrame.frame.pPitch[0]);
        // R,G,B,A

        if(num_frames <= 50){
            dmat.setTo(cv::Scalar(255,0,0,255));}
        else if(num_frames > 50 && num_frames <= 100){
            std::cout << "more than 50" << std::endl;
            dmat.setTo(cv::Scalar(0,255,0,255));}
        else{
            dmat.setTo(cv::Scalar(0,0,255,255));
            }
        
        status = cuCtxSynchronize();
        status = cuGraphicsUnregisterResource(pResource);
        NvDestroyEGLImage(egl_display, egl_image);
    }
    user_data = g_malloc(sizeof(int));
    GST_INFO ("NvBufferCreate %d", dmabuf_fd);
    *(int *)user_data = dmabuf_fd;
    NvBufferGetParams (dmabuf_fd, &par);
    data = g_malloc(par.nv_buffer_size);

    buffer = gst_buffer_new_wrapped_full(flags,
                                         data,
                                         par.nv_buffer_size,
                                         0,
                                         par.nv_buffer_size,
                                         user_data,
                                         notify_to_destroy);
    buffer->pts = timestamp;

    gst_buffer_map (buffer, &map, GST_MAP_WRITE);
    memcpy(map.data, par.nv_buffer , par.nv_buffer_size);
    gst_buffer_unmap(buffer, &map);

    g_signal_emit_by_name (appsrc_, "push-buffer", buffer, &ret);
    gst_buffer_unref(buffer);

    timestamp += 33333;
    return G_SOURCE_CONTINUE;
}

int main(int argc, char** argv) {
    USE(argc);
    USE(argv);

    gst_init (&argc, &argv);

    GMainLoop *main_loop;
    main_loop = g_main_loop_new (NULL, FALSE);
    ostringstream launch_stream;

    egl_display = eglGetDisplay(EGL_DEFAULT_DISPLAY);
    eglInitialize(egl_display, NULL, NULL);
    launch_stream
    << "appsrc name=mysource ! "
    << "video/x-raw(memory:NVMM),width="<< w <<",height="<< h <<",framerate=30/1,format=RGBA ! "
    << "nvvidconv ! video/x-raw(memory:NVMM),format=NV12 ! "
    << "nvoverlaysink ";
    //<< "fakesink";

    launch_string = launch_stream.str();

    g_print("Using launch string: %s\n", launch_string.c_str());

    GError *error = nullptr;
    gst_pipeline  = (GstPipeline*) gst_parse_launch(launch_string.c_str(), &error);

    if (gst_pipeline == nullptr) {
        g_print( "Failed to parse launch: %s\n", error->message);
        return -1;
    }
    if(error) g_error_free(error);

    appsrc_ = gst_bin_get_by_name(GST_BIN(gst_pipeline), "mysource");
    gst_app_src_set_stream_type(GST_APP_SRC(appsrc_), GST_APP_STREAM_TYPE_STREAM);

    gst_element_set_state((GstElement*)gst_pipeline, GST_STATE_PLAYING); 

    for (int i=0; i<150; i++) {
        feed_function(nullptr);
        num_frames++;
        usleep(33333);
    }

    gst_element_set_state((GstElement*)gst_pipeline, GST_STATE_NULL);
    gst_object_unref(GST_OBJECT(gst_pipeline));
    g_main_loop_unref(main_loop);
    eglTerminate(egl_display);

    g_print("going to exit \n");
    return 0;
}

I also see this now…I had quickly tried and not further tested, sorry.
I have tried adding error checking and also checked that the opencv cuda stuff was correct with downloading to CPU mat and it seems ok.
Sorry I cannot help further, but @DaneLLL might be able to further advise about this sample.

Hi,
Not sure but probably the OpenCV functions do not support gpuMat. You may try if CUDA filter works:

filter = cv::cuda::createSobelFilter(CV_8UC4, CV_8UC4, 1, 0, 3, 1, cv::BORDER_DEFAULT);

Or may try to map to cv::Mat. Please refer to this patch:
NVBuffer (FD) to opencv Mat - #6 by DaneLLL

Hi @DaneLLL, I think it is better to explain my problem:
I grab a frame from the camera, appsink, do some processing etc; then I want to send it to udpsink in H264 through appsrc and using nvv4l2enc.
Right now I’m using OpenCV to do the processing between appsink and appsrc, and using the CVVideoWriter with Gstreamer API, the main problem is latency and I identified it in the time it takes for the frame to being copy from memory to NVMM. I’m looking for a way to speed up this process.

I’m using Unified Memory so I have both a GpuMat, a Mat, and a pointer to the memory that is wrapped by OpenCV mats. None of my tries works after the first frame: copy operations (both Opencv copy that call cudaMemcpy2D or cudaMemcpy using the pointer) or color conversion (BGR->RGBA) directly on the GpuMat that wraps the EGLImage.

I think the problem is not in OpenCV functions but something about the feeding function and appsrc. I also tried without forcing a cuda context with cudaFree(0) because I’m not familiar with the Cuda Driver API but seems like the context “protects” the data.

Any way to solve this problem?

P.S.For my understanding 2 alternative are NVEnc from the jetson multimedia api or creating a eglstream with cuda Producer, but I’d like to use this code (for compatibility reasons).

1 Like

Hi,
For using hardware encoder you would need to create NvBuffer and have your data in the buffer. If your buffer in appsink is a GPU-accessible buffers, you may create NvBuffer in appsrc and copy the data through cudaMemcpy. NvBuffer does not support BGR format, so if you get BGR data in appsink, would need to handle format conversion from BGR to RGBA.

Here is an updated appsrc_nvmm sample to apply OpenCV CUDA filter:

// generate input frame data
$ gst-launch-1.0 videotestsrc num-buffers=150 ! video/x-raw,width=1920,height=1080,format=RGBA ! filesink location=1080.yuv
// build sample
$ CUDA_VER=10.2 ENABLE_OCV_CUDA=1 make
// run
$ ./appsrc_nvmm
// check a.mkv for the effect

appsrc_nvmm_ocv_cuda.zip (4.5 KB)

Thanks to @DaneLLL 's new example, it turns out that the timestamp increment was wrong (us instead of ns).

appsrc_nvmm.cpp
#include <cstdlib>
#include <gst/gst.h>
#include <gst/gstinfo.h>
#include <gst/app/gstappsrc.h>
#include <glib-unix.h>
#include <dlfcn.h>

#include <cstring>
#include <iostream>
#include <sstream>
#include <thread>

#include "nvbuf_utils.h"
#include <cuda.h>
#include <cuda_runtime.h>
#include <cudaEGL.h>

#include <opencv2/core.hpp>
#include <opencv2/core/cuda.hpp>


using namespace std;

#define USE(x) ((void)(x))

static GstPipeline *gst_pipeline = nullptr;
static string launch_string;
static GstElement *appsrc_;

GstClockTime timestamp = 0;
static int w = 1920;
static int h = 1080;
EGLDisplay egl_display;

static void
notify_to_destroy (gpointer user_data)
{
    GST_INFO ("NvBufferDestroy(%d)", *(int *)user_data);
    NvBufferDestroy(*(int *)user_data);
    g_free(user_data);
}


static gboolean feed_function(gpointer d) {
    GstBuffer *buffer;
    GstFlowReturn ret;
    GstMapInfo map = {0};
    int dmabuf_fd = 0;
    gpointer data = NULL, user_data = NULL;
    NvBufferParams par;
    GstMemoryFlags flags = (GstMemoryFlags)0;

    NvBufferCreate(&dmabuf_fd, w, h, NvBufferLayout_Pitch, NvBufferColorFormat_ABGR32);
    //CUDA process
    {
        EGLImageKHR egl_image;
        egl_image = NvEGLImageFromFd(egl_display, dmabuf_fd);
        CUresult status;
        CUeglFrame eglFrame;
        CUgraphicsResource pResource = NULL;
        cudaFree(0);
        status = cuGraphicsEGLRegisterImage(&pResource,
                    egl_image,
                    CU_GRAPHICS_MAP_RESOURCE_FLAGS_NONE);
        if (status != CUDA_SUCCESS)
        {
            printf("cuGraphicsEGLRegisterImage failed: %d \n",status);
        }
        status = cuGraphicsResourceGetMappedEglFrame(&eglFrame, pResource, 0, 0);
        status = cuCtxSynchronize();

        // CUDA code here
        cv::cuda::GpuMat dmat(h,w,CV_8UC4,eglFrame.frame.pPitch[0]);

        // R,G,B,A
        static int frame_num=0;
        int colorMode = ((frame_num++))%3;
        printf("frame %03d, mode %d\n", frame_num, colorMode);

        if (colorMode == 0)
         	dmat.setTo(cv::Scalar(255,0,0,255));
        else if (colorMode == 1) 
                dmat.setTo(cv::Scalar(0,255,0,255));
        else 
                dmat.setTo(cv::Scalar(0,0,255,255)); 

        status = cuCtxSynchronize();
        status = cuGraphicsUnregisterResource(pResource);
        NvDestroyEGLImage(egl_display, egl_image);
    }
    user_data = g_malloc(sizeof(int));
    GST_INFO ("NvBufferCreate %d", dmabuf_fd);
    *(int *)user_data = dmabuf_fd;

    NvBufferGetParams (dmabuf_fd, &par);
    data = g_malloc(par.nv_buffer_size);

    buffer = gst_buffer_new_wrapped_full(flags,
                                         data,
                                         par.nv_buffer_size,
                                         0,
                                         par.nv_buffer_size,
                                         user_data,
                                         notify_to_destroy);
    buffer->pts = timestamp;

    gst_buffer_map (buffer, &map, GST_MAP_WRITE);
    memcpy(map.data, par.nv_buffer, par.nv_buffer_size);
    gst_buffer_unmap(buffer, &map);

    g_signal_emit_by_name (appsrc_, "push-buffer", buffer, &ret);
    gst_buffer_unref(buffer);

    timestamp += 33333333;
    return G_SOURCE_CONTINUE;
}

int main(int argc, char** argv) {
    USE(argc);
    USE(argv);

    gst_init (&argc, &argv);

    GMainLoop *main_loop;
    main_loop = g_main_loop_new (NULL, FALSE);
    ostringstream launch_stream;

    egl_display = eglGetDisplay(EGL_DEFAULT_DISPLAY);
    eglInitialize(egl_display, NULL, NULL);
    launch_stream
    << "appsrc name=mysource ! "
    << "video/x-raw(memory:NVMM),width="<< w <<",height="<< h <<",framerate=30/1,format=RGBA ! "
    << "nvvidconv ! video/x-raw(memory:NVMM),format=NV12 ! "
    << "nvoverlaysink";

    launch_string = launch_stream.str();

    g_print("Using launch string: %s\n", launch_string.c_str());

    GError *error = nullptr;
    gst_pipeline  = (GstPipeline*) gst_parse_launch(launch_string.c_str(), &error);

    if (gst_pipeline == nullptr) {
        g_print( "Failed to parse launch: %s\n", error->message);
        return -1;
    }
    if(error) g_error_free(error);

    appsrc_ = gst_bin_get_by_name(GST_BIN(gst_pipeline), "mysource");
    gst_app_src_set_stream_type(GST_APP_SRC(appsrc_), GST_APP_STREAM_TYPE_STREAM);

    gst_element_set_state((GstElement*)gst_pipeline, GST_STATE_PLAYING); 

    for (int i=0; i<300; i++) {
        feed_function(nullptr);
    }

    // Wait for EOS message
    gst_element_send_event ((GstElement*)gst_pipeline, gst_event_new_eos ());
    GstBus *bus = gst_pipeline_get_bus(GST_PIPELINE(gst_pipeline));
    gst_bus_poll(bus, GST_MESSAGE_EOS, GST_CLOCK_TIME_NONE);

    gst_element_set_state((GstElement*)gst_pipeline, GST_STATE_NULL);
    gst_object_unref(GST_OBJECT(gst_pipeline));
    g_main_loop_unref(main_loop);
    eglTerminate(egl_display);

    g_print("going to exit \n");
    return 0;
}

Further played with that, and it seems that it may be too fast. Adding some usleep may help. See the following example…you’re almost ready for a pong ;-)

appsrc_nvvm.cpp
#include <cstdlib>
#include <gst/gst.h>
#include <gst/gstinfo.h>
#include <gst/app/gstappsrc.h>
#include <glib-unix.h>
#include <dlfcn.h>

#include <cstring>
#include <iostream>
#include <sstream>
#include <thread>

#include "nvbuf_utils.h"
#include <cuda.h>
#include <cuda_runtime.h>
#include <cudaEGL.h>

#include <opencv2/core.hpp>
#include <opencv2/core/cuda.hpp>


using namespace std;

#define USE(x) ((void)(x))

static GstPipeline *gst_pipeline = nullptr;
static string launch_string;
static GstElement *appsrc_;

GstClockTime timestamp = 0;
static int w = 1920;
static int h = 1080;
static int rect_size = 50;
static int speed = 10;
EGLDisplay egl_display;

static void
notify_to_destroy (gpointer user_data)
{
    GST_INFO ("NvBufferDestroy(%d)", *(int *)user_data);
    NvBufferDestroy(*(int *)user_data);
    g_free(user_data);
}


static gboolean feed_function(gpointer d) {
    GstBuffer *buffer;
    GstFlowReturn ret;
    GstMapInfo map = {0};
    int dmabuf_fd = 0;
    gpointer data = NULL, user_data = NULL;
    NvBufferParams par;
    GstMemoryFlags flags = (GstMemoryFlags)0;

    double startTime = (double)cv::getTickCount();
    {
	    NvBufferCreate(&dmabuf_fd, w, h, NvBufferLayout_Pitch, NvBufferColorFormat_ABGR32);
	    //CUDA process
	    {
	    	EGLImageKHR egl_image;
	    	egl_image = NvEGLImageFromFd(egl_display, dmabuf_fd);
	    	CUresult status;
	    	CUeglFrame eglFrame;
	    	CUgraphicsResource pResource = NULL;
	    	cudaFree(0);
	    	status = cuGraphicsEGLRegisterImage(&pResource,
	    	            egl_image,
	    	            CU_GRAPHICS_MAP_RESOURCE_FLAGS_NONE);
	    	if (status != CUDA_SUCCESS)
	    	{
	    	    printf("cuGraphicsEGLRegisterImage failed: %d \n",status);
	    	}
	    	status = cuGraphicsResourceGetMappedEglFrame(&eglFrame, pResource, 0, 0);
	    	status = cuCtxSynchronize();

	    	//printf("eglFrame width=%d, height=%d, format=%d, pitch[0]=%p\n", eglFrame.width, eglFrame.height, eglFrame.eglColorFormat, eglFrame.frame.pPitch[0]);

	    	// CUDA code here
	    	{
	    		cv::cuda::GpuMat dmat(h,w,CV_8UC4,eglFrame.frame.pPitch[0]);

	    		static int frame_num=0;
	    		//printf("frame %03d\n", frame_num);
	    		//                    R  ,G,B,A
	    		dmat.setTo(cv::Scalar(0,0,255,255));
	    		{
	    			cv::cuda::GpuMat roi(dmat, cv::Rect(frame_num*speed % (w-rect_size), frame_num*speed % (h-rect_size), rect_size, rect_size));
	    			roi.setTo(cv::Scalar(255,255,0,255));
	    		}
	    		++frame_num;
	    	}

	    	status = cuCtxSynchronize();
	    	status = cuGraphicsUnregisterResource(pResource);
	    	NvDestroyEGLImage(egl_display, egl_image);
	    }
	    user_data = g_malloc(sizeof(int));
	    GST_INFO ("NvBufferCreate %d", dmabuf_fd);
	    *(int *)user_data = dmabuf_fd;

	    NvBufferGetParams (dmabuf_fd, &par);
	    data = g_malloc(par.nv_buffer_size);

	    buffer = gst_buffer_new_wrapped_full(flags,
		                                 data,
		                                 par.nv_buffer_size,
		                                 0,
		                                 par.nv_buffer_size,
		                                 user_data,
		                                 notify_to_destroy);
	    buffer->pts = timestamp;

	    gst_buffer_map (buffer, &map, GST_MAP_WRITE);
	    memcpy(map.data, par.nv_buffer, par.nv_buffer_size);
	    gst_buffer_unmap(buffer, &map);

	    g_signal_emit_by_name (appsrc_, "push-buffer", buffer, &ret);
	    gst_buffer_unref(buffer);
    }
    double endTime = (double)cv::getTickCount();
    double process_time_us = ((endTime - startTime)/cv::getTickFrequency())*1e6;
    //printf("process time: %f us\n", process_time_us);

    // Wait for 33.333 ms - process time and use a 10% margin...seems ok for 5000 frames. For the long run you would adapt to a better model
    double sleep_us = (33333.3 - process_time_us)/1.1;
    //printf("sleep=%f us\n", sleep_us);
    usleep(sleep_us);

    timestamp += 33333333;
    return G_SOURCE_CONTINUE;
}

int main(int argc, char** argv) {
    USE(argc);
    USE(argv);

    gst_init (&argc, &argv);

    GMainLoop *main_loop;
    main_loop = g_main_loop_new (NULL, FALSE);
    ostringstream launch_stream;

    egl_display = eglGetDisplay(EGL_DEFAULT_DISPLAY);
    eglInitialize(egl_display, NULL, NULL);

    //setenv("GST_DEBUG", "*:3", 0);
    launch_stream
    << "appsrc name=mysource ! "
    << "video/x-raw(memory:NVMM),width="<< w <<",height="<< h <<",framerate=30/1,format=RGBA ! "
    << "nvvidconv ! video/x-raw(memory:NVMM),format=NV12 ! "
    << "nvoverlaysink";
    //<< "fpsdisplaysink text-overlay=false video-sink=fakesink sync=true";

    launch_string = launch_stream.str();

    g_print("Using launch string: %s\n", launch_string.c_str());

    GError *error = nullptr;
    gst_pipeline  = (GstPipeline*) gst_parse_launch(launch_string.c_str(), &error);

    if (gst_pipeline == nullptr) {
        g_print( "Failed to parse launch: %s\n", error->message);
        return -1;
    }
    if(error) g_error_free(error);

    appsrc_ = gst_bin_get_by_name(GST_BIN(gst_pipeline), "mysource");
    gst_app_src_set_stream_type(GST_APP_SRC(appsrc_), GST_APP_STREAM_TYPE_STREAM);

    gst_element_set_state((GstElement*)gst_pipeline, GST_STATE_PLAYING); 

    for (int i=0; i<5000; i++) {
        feed_function(nullptr);
    }

    // Wait for EOS message
    gst_element_send_event ((GstElement*)gst_pipeline, gst_event_new_eos ());
    GstBus *bus = gst_pipeline_get_bus(GST_PIPELINE(gst_pipeline));
    gst_bus_poll(bus, GST_MESSAGE_EOS, GST_CLOCK_TIME_NONE);

    gst_element_set_state((GstElement*)gst_pipeline, GST_STATE_NULL);
    gst_object_unref(GST_OBJECT(gst_pipeline));
    g_main_loop_unref(main_loop);
    eglTerminate(egl_display);

    g_print("going to exit \n");
    return 0;
}

[EDIT: better wait from main than in drawing function. See poorpong example below.]

Thanks for the help, now it is working and I’ll try to implement this in a class for my project.

after few other experiments I’d like to show the last modification on the timestamps as it was the main problem.

First the pipeline

    launch_stream
    << "appsrc name=mysource, is-live=true, do-timestamps=true ! "
    << "video/x-raw(memory:NVMM),width="<< w <<",height="<< h <<",framerate=30/1,format=RGBA ! "
    << "nvvidconv ! video/x-raw(memory:NVMM),format=NV12 ! "
    << "nvoverlaysink sync=false";

and also the feed function

//extra variable
GstClockTime timestamp, duration;

// feed function
static gboolean feed_function(gpointer d) {
// ...

// instead of buffer->pts = timestamp
timestamp = num_frames*duration;
duration = ((double)1/30) * GST_SECOND; // 30 fps
// ...

//set the current number in the frame and timestamp
GST_BUFFER_PTS(buffer) = timestamp;
GST_BUFFER_DTS(buffer) = timestamp;
GST_BUFFER_OFFSET(buffer) = num_frames;
GST_BUFFER_DURATION(buffer) = duration;
// ...
}

Results seems consisted at the eye exam, I’ll do some profiling to confirm my sensations.
Nothing else to add if not a big thanks to both you @Honey_Patouceul and @DaneLLL :)

I’ve mentioned pong above, and finally played to make a poor one-player version from this, just as an example of drawing with opencv cuda.
The design and gameplay are poor, but this is just a fun example, anyone is welcome to improve such as adding a second player controlled by AI…
You would just need an opencv version with CUDA enabled (tested with opencv-4.5.3).

Opencv CUDA NVMM gstreamer appsink one-player mouse-controlled PONG

poorpong_nvmm.cpp
#include <cstdlib>
#include <gst/gst.h>
#include <gst/gstinfo.h>
#include <gst/app/gstappsrc.h>
#include <glib-unix.h>
#include <dlfcn.h>
#include <stropts.h>
#include <poll.h>

#include <cstring>
#include <iostream>
#include <sstream>
#include <thread>
#include <vector>

#include "nvbuf_utils.h"
#include <cuda.h>
#include <cuda_runtime.h>
#include <cudaEGL.h>
#include "X11/Xlib.h"

#include <opencv2/core.hpp>
#include <opencv2/core/cuda.hpp>
#include "opencv2/imgproc.hpp"
#include "opencv2/highgui.hpp"

using namespace std;

#define USE(x) ((void)(x))


typedef struct {
    int left_bar_ypos;
    int ball_xpos;
    int ball_ypos;
    int digits_xpos;
    int count10_idx;
    int count1_idx;
} pongDisplayData;


const int frame_width = 1920;
const int frame_height = 1080;

const int ball_size = 50;
const int initial_speed = 30;

const int bar_width = 50;
const int bar_height = 300;

const int digit_height=200;
const int digit_width=(digit_height*60)/100;

// R, G, B, A
const cv::Scalar background_color(0,0,255,255);
const cv::Scalar ball_color(255,255,0,255);
const cv::Scalar bar_color(255,0,0,255);
const cv::Scalar digits_color(0,255,0,255);


static GstPipeline *gst_pipeline = nullptr;
static string launch_string;
static GstElement *appsrc_;
GstClockTime timestamp = 0;

EGLDisplay egl_display;


static int display_w;
static int display_h;

static int GetPointer_Y (Display *display, Window *root_window) {
    Window r, c;
    int x, y, rx, ry;
    unsigned int m;
    bool b = XQueryPointer(display, *root_window, &r, &c, &rx, &ry, &x, &y, &m);
    if (b) {
        // Scale from display to frame
        return (int)((frame_height - bar_height)*(y/(double)display_h));
    }
    else
        return (-1);
}

static cv::cuda::GpuMat d_ball_mask(ball_size, ball_size, CV_8UC1);
static cv::cuda::GpuMat d_bar_mask(bar_height, bar_width, CV_8UC1);
static std::vector< cv::cuda::GpuMat > d_numbers_mask(10);

static void PrepareMasks(void)
{
    // Bar mask
    cv::Mat h_bar_mask = cv::Mat(bar_height, bar_width, CV_8UC1);
    h_bar_mask.setTo(cv::Scalar(255));
    d_bar_mask.upload(h_bar_mask);

    // Ball mask
    cv::Mat h_ball_mask = cv::Mat::zeros(ball_size, ball_size, CV_8UC1);
    cv::circle(h_ball_mask, cv::Point(ball_size/2, ball_size/2), ball_size/2, cv::Scalar(255), cv::FILLED, 8, 0);
    d_ball_mask.upload(h_ball_mask);

    // [0-9] digits masks
    for(unsigned int i=0; i <10; ++i)
    {
        char buf[2];
        sprintf(buf, "%d", i);
        cv::Mat h_digit_mask = cv::Mat::zeros(digit_height, digit_width, CV_8UC1);
        cv::putText (h_digit_mask, buf, cv::Point (0,digit_width), cv::FONT_HERSHEY_SIMPLEX, digit_height/40, cv::Scalar(255), digit_height/10);
        d_numbers_mask[i].upload(h_digit_mask);
    }
}


static void notify_to_destroy (gpointer user_data)
{
    GST_INFO ("NvBufferDestroy(%d)", *(int *)user_data);
    NvBufferDestroy(*(int *)user_data);
    g_free(user_data);
}


static gboolean feed_function(gpointer d) {
    GstBuffer *buffer;
    GstFlowReturn ret;
    GstMapInfo map = {0};
    int dmabuf_fd = 0;
    gpointer data = NULL, user_data = NULL;
    NvBufferParams par;
    GstMemoryFlags flags = (GstMemoryFlags)0;

    {
        static int frame_num=0;

        NvBufferCreate(&dmabuf_fd, frame_width, frame_height, NvBufferLayout_Pitch, NvBufferColorFormat_ABGR32);
        //CUDA process
        {
            EGLImageKHR egl_image;
            egl_image = NvEGLImageFromFd(egl_display, dmabuf_fd);
            CUresult status;
            CUeglFrame eglFrame;
            CUgraphicsResource pResource = NULL;
            cudaFree(0);
            status = cuGraphicsEGLRegisterImage(&pResource,
                                                egl_image,
                                                CU_GRAPHICS_MAP_RESOURCE_FLAGS_NONE);
            if (status != CUDA_SUCCESS)
            {
                printf("cuGraphicsEGLRegisterImage failed: %d \n",status);
            }
            status = cuGraphicsResourceGetMappedEglFrame(&eglFrame, pResource, 0, 0);
            status = cuCtxSynchronize();

            // CUDA code here
            {
                cv::cuda::GpuMat dmat(frame_height, frame_width,CV_8UC4,eglFrame.frame.pPitch[0]);
                pongDisplayData *data = (pongDisplayData*) d;

                // Set background
                dmat.setTo(background_color);

                // Draw Count digits
                {
                    cv::cuda::GpuMat roi(dmat, cv::Rect(cv::Point(data->digits_xpos, 10), cv::Size(digit_width, digit_height)));
                    roi.setTo(digits_color, d_numbers_mask[data->count10_idx]);
                }
                {
                    cv::cuda::GpuMat roi(dmat, cv::Rect(cv::Point(data->digits_xpos + digit_width, 10), cv::Size(digit_width, digit_height)));
                    roi.setTo(digits_color, d_numbers_mask[data->count1_idx]);
                }

                // Draw left bar
                {
                    cv::cuda::GpuMat roi(dmat, cv::Rect(cv::Point(50, data->left_bar_ypos), cv::Size(bar_width, bar_height)));
                    roi.setTo(bar_color, d_bar_mask);
                }

                // Draw ball
                {
                    cv::cuda::GpuMat roi(dmat, cv::Rect(data->ball_xpos, data->ball_ypos, ball_size, ball_size));
                    roi.setTo(ball_color, d_ball_mask);
                }

                // Safety check
                if (dmat.data != eglFrame.frame.pPitch[0])
                    fprintf (stderr, "Error: re-allocated dmat\n");
            }

            status = cuCtxSynchronize();
            status = cuGraphicsUnregisterResource(pResource);
            NvDestroyEGLImage(egl_display, egl_image);
        }
        user_data = g_malloc(sizeof(int));
        GST_INFO ("NvBufferCreate %d", dmabuf_fd);
        *(int *)user_data = dmabuf_fd;

        NvBufferGetParams (dmabuf_fd, &par);
        data = g_malloc(par.nv_buffer_size);

        buffer = gst_buffer_new_wrapped_full(flags,
                                             data,
                                             par.nv_buffer_size,
                                             0,
                                             par.nv_buffer_size,
                                             user_data,
                                             notify_to_destroy);

        GST_BUFFER_PTS(buffer) = timestamp;
        GST_BUFFER_DTS(buffer) = timestamp;
        GST_BUFFER_OFFSET(buffer) = frame_num++;
        GST_BUFFER_DURATION(buffer) = ((double)1/30) * GST_SECOND;

        gst_buffer_map (buffer, &map, GST_MAP_WRITE);
        memcpy(map.data, par.nv_buffer, par.nv_buffer_size);
        gst_buffer_unmap(buffer, &map);

        g_signal_emit_by_name (appsrc_, "push-buffer", buffer, &ret);
        gst_buffer_unref(buffer);
    }

    // Free pongDisplayData
    g_free(d);

    timestamp += 33333333;
    return G_SOURCE_CONTINUE;
}




typedef struct {
    int count;
    int left_bar_ypos;
    int left_bar_yspeed;
    int ball_xpos;
    int ball_ypos;
    int ball_xspeed;
    int ball_yspeed;
} pongCtrlData;



int PoorPongController_onePlayer(pongCtrlData& ctrlData) {
    // ball projection for next step
    int ball_xnext = ctrlData.ball_xpos + ctrlData.ball_xspeed;
    int ball_ynext = ctrlData.ball_ypos + ctrlData.ball_yspeed;

    if ((ball_xnext < 50 + bar_width) && (ball_ynext + ball_size > ctrlData.left_bar_ypos) && (ball_ynext < ctrlData.left_bar_ypos + bar_height)) {
        // Ball is in bar zone... reverse xspeed
        ctrlData.ball_xspeed = -ctrlData.ball_xspeed;
        // Set new xpos
        ctrlData.ball_xpos = 50 + (ctrlData.ball_xpos - ball_xnext);
        // Add 50% of bar yspeed
        ctrlData.ball_yspeed += (int)(0.5*ctrlData.left_bar_yspeed);
        // Increase count ane both speeds with 10%
        ++ctrlData.count;
        ctrlData.ball_xspeed = (int)(1.1*ctrlData.ball_xspeed);
        ctrlData.ball_yspeed = (int)(1.1*ctrlData.ball_yspeed);
    }
    else if (ball_xnext < 0) {
        // Ball reached left of the frame...Game over
        return -1;
    }
    else if ((ball_xnext > frame_width - ball_size) || (ball_xnext < 0)) {
        // Ball reached right of the frame... reverse xspeed
        ctrlData.ball_xspeed = -ctrlData.ball_xspeed;
    }
    else {
        // Ball moved on x, update
        ctrlData.ball_xpos = ball_xnext;
    }

    if ((ball_ynext > frame_height - ball_size) || (ball_ynext < 0)) {
        // Ball reached top or bottom of the frame... reverse yspeed
        ctrlData.ball_yspeed = -ctrlData.ball_yspeed;
    }
    else {
        // Ball moved on y, update
        ctrlData.ball_ypos = ball_ynext;
    }
    return 0;
}

int main(int argc, char** argv) {
    USE(argc);
    USE(argv);

    Display *display;
    display = XOpenDisplay(0);
    display_w = DisplayWidth(display, 0);
    display_h = DisplayHeight(display, 0);

    Window root_window;
    root_window = XRootWindow(display, 0);


    egl_display = eglGetDisplay(EGL_DEFAULT_DISPLAY);
    eglInitialize(egl_display, NULL, NULL);

    PrepareMasks();

    gst_init (&argc, &argv);
    GMainLoop *main_loop;
    main_loop = g_main_loop_new (NULL, FALSE);
    ostringstream launch_stream;
    launch_stream
            << "appsrc name=mysource ! "
            << "video/x-raw(memory:NVMM),width="<< frame_width <<",height="<< frame_height <<",framerate=30/1,format=RGBA ! "
            << "nvegltransform ! nveglglessink";
    //<< "nvvidconv ! video/x-raw(memory:NVMM),format=NV12 ! nvoverlaysink";
    //<< "nvvidconv ! video/x-raw(memory:NVMM),format=NV12 ! nv3dsink";
    //<< "nvvidconv ! video/x-raw, format=YUY2 ! xvimagesink";

    launch_string = launch_stream.str();
    g_print("Using launch string: %s\n", launch_string.c_str());

    GError *error = nullptr;
    gst_pipeline  = (GstPipeline*) gst_parse_launch(launch_string.c_str(), &error);

    if (gst_pipeline == nullptr) {
        g_print( "Failed to parse launch: %s\n", error->message);
        return -1;
    }
    if(error) g_error_free(error);

    appsrc_ = gst_bin_get_by_name(GST_BIN(gst_pipeline), "mysource");
    gst_app_src_set_stream_type(GST_APP_SRC(appsrc_), GST_APP_STREAM_TYPE_STREAM);

    gst_element_set_state((GstElement*)gst_pipeline, GST_STATE_PLAYING);


    // Set initial state
    pongCtrlData ctrlData;
    ctrlData.count = 0;
    ctrlData.left_bar_ypos = GetPointer_Y(display, &root_window);
    ctrlData.left_bar_yspeed = 0;
    ctrlData.ball_xpos = 50 + bar_width;
    ctrlData.ball_ypos = ctrlData.left_bar_ypos + bar_height/2;
    ctrlData.ball_xspeed = initial_speed;
    ctrlData.ball_yspeed = initial_speed/5;

    while (1) {
        double startTime = (double)cv::getTickCount();

        // Get pointer Y pos
        int ret = GetPointer_Y(display, &root_window);
        if (ret >= 0) {
            ctrlData.left_bar_yspeed = ret - ctrlData.left_bar_ypos;
            ctrlData.left_bar_ypos = ret;
        }

        // Update state
        ret = PoorPongController_onePlayer(ctrlData);
        if (ret < 0)
            break;

        // Set display data
        pongDisplayData *dispData = (pongDisplayData *)g_malloc(sizeof(pongDisplayData));
        dispData->left_bar_ypos = ctrlData.left_bar_ypos;
        dispData->ball_xpos = ctrlData.ball_xpos;
        dispData->ball_ypos = ctrlData.ball_ypos;
        dispData->digits_xpos = (int)(frame_width/2 - digit_width);
        dispData->count10_idx =  (ctrlData.count/10)%10; // supports only up to 99
        dispData->count1_idx =  ctrlData.count%10;

        // Draw into NVMM frame
        feed_function(dispData);

        // Compute processing + drawing time, and sleep until next frame if we're early
        double endTime = (double)cv::getTickCount();
        double process_time_us = ((endTime - startTime)/cv::getTickFrequency())*1e6;
        double sleep_us = (33333.3 - process_time_us)/1.01;
        if (sleep_us >= 0)
            usleep(sleep_us);
        else
            printf("Late...\n");
    }

    /* Game over... show last scene for 2 seconds before exit */
    for (int i = 0; i < 60; i++)
    {
        double startTime = (double)cv::getTickCount();
        pongDisplayData *dispData = (pongDisplayData *)g_malloc(sizeof(pongDisplayData));
        dispData->left_bar_ypos = ctrlData.left_bar_ypos;
        dispData->ball_xpos = ctrlData.ball_xpos;
        dispData->ball_ypos = ctrlData.ball_ypos;
        dispData->digits_xpos = (int)(frame_width/2 - digit_width);
        dispData->count10_idx =  (ctrlData.count/10)%10; // supports only up to 99
        dispData->count1_idx =  ctrlData.count%10;

        // Draw frame
        feed_function(dispData);

        // Sleep until next frame if we're early
        double endTime = (double)cv::getTickCount();
        double process_time_us = ((endTime - startTime)/cv::getTickFrequency())*1e6;
        double sleep_us = (33333.3 - process_time_us)/1.01;
        if (sleep_us >= 0)
            usleep(sleep_us);
        else
            printf("Late...\n");

    }

    printf("Game Over - Count: %d\n", ctrlData.count);

    // Wait for EOS message
    gst_element_send_event ((GstElement*)gst_pipeline, gst_event_new_eos ());
    GstBus *bus = gst_pipeline_get_bus(GST_PIPELINE(gst_pipeline));
    gst_bus_poll(bus, GST_MESSAGE_EOS, GST_CLOCK_TIME_NONE);


    gst_element_set_state((GstElement*)gst_pipeline, GST_STATE_NULL);
    gst_object_unref(GST_OBJECT(gst_pipeline));
    g_main_loop_unref(main_loop);
    eglTerminate(egl_display);

    g_print("going to exit \n");
    return 0;
}
Makefile
################################################################################
# Copyright (c) 2019, NVIDIA CORPORATION.  All rights reserved.
#
# Permission is hereby granted, free of charge, to any person obtaining a
# copy of this software and associated documentation files (the "Software"),
# to deal in the Software without restriction, including without limitation
# the rights to use, copy, modify, merge, publish, distribute, sublicense,
# and/or sell copies of the Software, and to permit persons to whom the
# Software is furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in
# all copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
# THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
# FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
# DEALINGS IN THE SOFTWARE.
#################################################################################

APP:= poorpong_nvmm
CUDA_VER?=10.2
OPENCV_DIR=/usr/local

ifeq ($(CUDA_VER),)
  $(error "CUDA_VER is not set")
endif
CXX:= g++
SRCS:= poorpong_nvmm.cpp

CFLAGS:= -Wall -std=c++11 -ggdb\
        -I/usr/src/jetson_multimedia_api/include \
	-I/usr/local/cuda-$(CUDA_VER)/include \
        -I$(OPENCV_DIR)/include/opencv4

LIBS:= -Wall -std=c++11 \
	-L/usr/lib/aarch64-linux-gnu/tegra/ -lEGL -lGLESv2 \
	-L/usr/lib/aarch64-linux-gnu/tegra/ -lcuda -lnvbuf_utils \
	-L/usr/local/cuda-$(CUDA_VER)/lib64/ -lcudart \
	-L$(OPENCV_DIR)/lib -lopencv_core -lopencv_imgproc -lopencv_highgui -lX11

OBJS:= $(SRCS:.cpp=.o)

PKGS:= gstreamer-app-1.0
CFLAGS+= `pkg-config --cflags $(PKGS))`
LIBS+= `pkg-config --libs $(PKGS)`

all: $(APP)

%.o: %.cpp
	@echo "Compiling: $<"
	$(CXX) $(CFLAGS) -c $< -o $@

$(APP): $(OBJS)
	@echo "Linking: $@"
	$(CXX) -o $@ $(OBJS) $(CFLAGS) $(LIBS)

clean:
	rm -rf $(OBJS) $(APP)

Have fun !