Retrieve buffer from gstreamer flow when using video/x-raw(memory:NVMM)

I am trying to create a gstreamer plugin that retrieves a gst buffer and creates a cv::gpu::GpuMat without making any copy. My pipeline is:

gst-launch-1.0 rtspsrc location=rtsp://ip:port/file.mp4 ! rtph264depay ! decodebin ! nvvidconv ! 'video/x-raw(memory:NVMM),format=(string)RGBA' ! myPlugin ! fakesink

The code I am using in order to get the gpu mat inside the plugin is:

GstMapInfo map;
gst_buffer_map(buf, &map, GST_MAP_WRITE),
void* gstData =;
const uint32_t gstSize = map.size;
cv::gpu::GpuMat m_gpu(height, width, CV_8UC4, (uchar*)gstData);
cv::imshow("window", cv::Mat(m_gpu));

Now, the problems are:

  • If I print the size of the buffer (gstSize) I get 808 bytes instead of the size of the frame. If I remove the "(memory:NVMM)" the size is the dimension of the frame and it would be ok but I have another problem (following)
  • If I do not use "(memory:NVMM)" I cannot create the gpuMat and then the imshow does not work. This is weird because I am using a TX2 which has unified memory but the memory address of gstData is not accessible from the device. The only solution would be to use a memcpy from host to device for each frame but I would avoid it. A possible solution would be to use cudaHostRegister followed by cudaHostGetDevicePointer but cudaHostRegister seems not to be supported by Jetson architecture.

This may not fit your need, but you may check my post

Thank you, Honey_Patouceul.

Thank you very much Honey_Patouceul. I managed to use the Nvivafilter. I have one question though: in which color format are the data I get from the pdata pointer? Is it already an RGB pointer?

It should be RGBA (use cv type CV_8UC4).

Yes it is. Thanks. Sorry to bother you again Honey_Patouceul, do you know if it is possible to receive multiple streams using nvivafilter? In this post they mention tegra_multimedia_api, but I do not understand how to use it. Do you know where can I find some tutorial or simple guide lines about this topics?

Although it might not be impossible to use several inputs, this would need some work. I had made a tentative at that time (multiple nvivafilters using the same custom lib and synchronizing threads with semaphores), but I have not finished it. I’ll let you know if I find some time for this, but I’m unsure the performance can be good.
The right solution would be nvidia to release a multiple inputs plugin, or even just a nvvideomixer for mixing several NVMM inputs into a single big image…This would allow easy videostitching, for example.

About MM API, just have a look to samples in /home/ubuntu/tegra_multimedia_api/samples.

I see. Thank you very much, I hope nvidia will do something about that.

Hi andreaaa,
nvivafilter only handles single input. Please refer to tegra_multimedia_api for multiple inputs.
You can install samples via Jetpack.

I have Nvidia GTX 1080Ti card and i am doing some vision stuff using gstreamer and opencv. I want access to the decoded gstbuffer (decoded using GPU using vaapi) and create a GpuMat from it without copying data from Host to Device (as described by OP in second point).

how can i do it.?
Is Nvivafilter available for platform other than Jetson ? if not what are the alternatives.

Hi andreaaa,

You may found interesting the following information about the GstCUDA framework, I think that is exactly what you are looking for. Below you will find a more detailed description, but in summary, it consists of a framework that allows to easily and optimally interface GStreamer with CUDA, guaranteeing zero memory copies. It also supports several inputs.

GstCUDA is a RidgeRun developed GStreamer plug-in enabling easy CUDA algorithm integration into GStreamer pipelines. GstCUDA offers a framework that allows users to develop custom GStreamer elements that execute any CUDA algorithm. The GstCUDA framework is a series of base classes abstracting the complexity of both CUDA and GStreamer. With GstCUDA, developers avoid writing elements from scratch, allowing the developer to focus on the algorithm logic, thus accelerating time to market.

GstCUDA offers a GStreamer plugin that contains a set of elements, that are ideal for GStreamer/CUDA quick prototyping. Those elements consist in a set of filters with different input/output pads combinations, that are run-time loadable with an external custom CUDA library that contains the algorithm to be executed on the GPU on each video frame that passes through the pipeline. GstCUDA plugin allows users to develop their own CUDA processing library, pass the library into the GstCUDA filter element that best adapts to the algorithm requirements, executes the library on the GPU, passing upstream frames from the GStreamer pipeline to the GPU and passing the modified frames downstream to the next element in the GStreamer pipeline. Those elements were created with the CUDA algorithm developer in mind - supporting quick prototyping and abstracting all GStreamer concepts. The elements are fully adaptable to different project needs, making GstCUDA a powerful tool that is essential for CUDA/GStreamer project development.

One remarkable feature of GstCUDA is that it provides a zero memory copy interface between CUDA and GStreamer on Jetson TX1/TX2 platforms. This enables heavy algorithms and large amounts of data (up to 2x 4K 60fps streams) to be processed on CUDA without the performance caused by copies or memory conversions. GstCUDA provides the necessary APIs to directly handle NVMM buffers to achieve the best possible performance on Jetson TX1/TX2 platforms. It provides a series of base classes and utilities that abstract the complexity of handle memory interface between GStreamer and CUDA, so the developer can focus on what actually gives value to the end product. GstCuda ensures an optimal performance for GStreamer/CUDA applications on Jetson platforms.

You can find detailed information about GstCUDA on the following link:

I hope this information can be useful to you.

Best regards,

Hi @andreaaa and @Honey_Patouceul ,

I want to create the same kind of plugin now. I can’t use nvivafilter with CUDA post process due to the reason that I need to pass the input data into the plugin on every frame. and draw OSD on video frame using “nvosd.h” from L4T.

I have created a simple plugin based on transform template plugin. In ‘transform_ip’ function I can get the GstBuffer.
Do you have any clues to convert this GstBuffer into GPU buffer?

Thank you.

You may try this example.