Getting frames from Gstreamer appsink to GpuMat / Mat from NVMM memory

My target is to capture video (h264) frames from RTSP source and pass it to AppSink. I have achieved this using the following :

descr = g_strdup_printf ("rtspsrc location=%s latency=0 protocols=udp-unicast ! rtph264depay ! h264parse ! omxh264dec   ! nvvidconv  ! "video/x-raw(memory:NVMM), format=NV12"  ! appsink  name=%s ", filename,sink_name);

And this is how I am grabbing from from Callback and encapsulating it in Mat

buffer = gst_sample_get_buffer (sample);
        gst_buffer_map (buffer, &map, GST_MAP_READ);
Mat iframe(cv::Size(width, (ORIG_HEIGHT/SIZE_DIVISOR) * 3 / 2), CV_8UC1, (char*)

I get segmentation fault while copying the data.

However if i remove “(memory:NVMM)” the pipeline runs fine but CPU seems super busy.

How do I use “(memory:NVMM)” in my use-case and get frame data copied to Mat or GpuMat directly or indirectly.

Hi Mdotali,

As far as I know, the appsink do not know where is the “NVMM” memory, if you switch to regular memory, it should works fine.


But that takes a lot of CPU. And I have to run three such pipelines. Any other way around to capture frames from gstreamer appsink but not pushing stuff on CPU ?

Instead of using custom video sink, why not using the those sinks provided by Nvidia, I also notice that CPU is couple times high than “NVMM” memory based method, but even we pass the video data to GPU, we still need CPU cycles to finish this if use non-NVMM memory, unless finding a way to connect Nvida’s omx based decoder with your custom GPU buffer.

Hello, mdotali:
‘usaarizona’ is right.

For high CPU loading, would you please describe your detailed pipeline? From your code, it seems that openCV is called. that may cause high cpu loading.
‘nvvidconv’ is hardware-accelerated and it will not eat CPU too much.


@jachen seems quite an old post but building on the same progress, what is the workaround if one is using opencv?
I used filesink into a /dev/stdout and read from it as numpy, it reduces cpu usage, but its also slow!
This is my pipeline for VideoCapture :

rtspsrc location=’+rtsp_url+’ latency=2 ! rtph264depay ! h264parse ! nvv4l2decoder drop-frame-interval=30 ! nvvideoconvert ! video/x-raw,formate=RGBA ! appsink

If it could somehow be kept in gpu only and only pointer passed along to later deep learning models(I can make it tensorrt)

Hello, saransh661:
Please initialize a new thread in proper board for your question.
You can try deep-stream in recent SDK, and it may get better performance.