Process multiple Gstreamer buffers directly allocated in CUDA memory

I would like to retrieve a buffer from a Gstreamer pipeline and do some processing on it using CUDA functions on gpu. We tried several approaches on our Jetson TX2:

  1. gstreamer custom plugin: we downloaded the standard gstreamer template that processes images using the so called “chain function”, which retrieves the input buffer as GstBuffer, elaborates it and pushes the output in the source pad. The main problem is that the data pointer of the buffer is being allocated on the cpu memory and then it is not directly accessible by CUDA.
    An alternative would be to use the NVMM memory obtained using a caps filter followed by videoconvert plugin. However, CUDA still cannot read the input buffer. As a result, a memory copy from host to device is necessary even if the TX2 supports unified memory protocol.

  2. NVIVAFILTER plugin: after some research I found out that the only solution for direct gpu memory allocation using gstreamer is to use NVIVAFILTER that calls a .so library.
    https://devtalk.nvidia.com/default/topic/1022543/jetson-tx2/gstreamer-nvmm-lt-gt-opencv-gpumat/post/5208232/#5208232
    Although I managed to use CUDA functions on the input data, NVIVAFILTER accepts only one input buffer at a time and this is a problem in case of image stitching or stereo acquisition because they both need multiple inputs.

  3. tegra_multimedia_api: another possibility is to use tegra_multimedia_api functions, that is mentioned in this question that we made:
    https://devtalk.nvidia.com/default/topic/1023700/jetson-tx2/retrieve-buffer-from-gstreamer-flow-when-using-video-x-raw-memory-nvmm-/post/5208914/#5208914
    But this does not allow to use gstreamer parallelism because it is a separated library.

To sum up, my question is: is there a gstreamer plugin/tool that allows to process multiple inputs directly allocated in CUDA memory and to output multiple buffers too?

Hi,

NvMM memory can’t be accessed via CUDA directly. We use EGLImage for this purpose. Wrap NvMM memory in EGLImage and Use that with CUDA.
NVIVAFILTER is a GStreamer element for this purpose. It takes NvMMBuffer and provides EGLImage for CUDA processing.
But NVIVAFILTER can only take single input.

Do you try VisionWorks package before? VisionWorks is NVidia implementation for OpenVX.
In our stereo matching sample, we read camera frame via GStreamer and process it with CUDA, and render.

VisionWorks can be installed via JetPack directly.

Thanks.

Hi, thanks for your help.

I checked the stereo matching example and I saw that the input is only one, composed by the left and the right views one over the other. The input is then divided in 2 parts. In my case, I would like to get 2 or more inputs from different sources at the same time. In the OVX library I saw that VisionWorks uses gstreamer to acquire and EGL to directly access cuda memory. Is it possible to wrap the NVMM memory in an EGLimage directly into the chain function of my custom gstreamer plugin and manage multiple inputs (e.g. the right and left images separately)?

Another problem that I have is that if I use video/x-raw(memory:NVMM) and I print the size of the input buffer I get a value of 808 bytes and this is wrong since I should get a bigger buffer. The pipeline I am trying to use is:

gst-launch-1.0 rtspsrc location=rtsp://ip:port/file.mp4 ! rtph264depay ! decodebin ! nvvidconv ! 'video/x-raw(memory:NVMM),format=(string)RGBA' ! <b>MYPLUGIN </b>! fakesink

Is there something wrong in the pipe? Inside the plugin I am just reading the buffer size using gst_buffer_map(buffer, &map, GST_MAP_READ) and printf(“gst size = %d”, map.size).

Hi,

  1. You can modify VisionWorks sample to dual input.

  2. As comment #2, we use EGLImage to wrap NvMM buffer for CUDA access. NVIVAFILTER is a GStreamer element for this purpose.

  3. For pipeline issue, please check user guide first:
    https://developer.nvidia.com/embedded/dlc/l4t-accelerated-gstreamer-guide-28-1

Thanks.

Hi andreaaa,

You may found interesting the following information about the GstCUDA framework, I think that is exactly what you are looking for. Below you will find a more detailed description, but in summary, it consists of a framework that allows to easily and optimally interface GStreamer with CUDA, guaranteeing zero memory copies. It also supports several inputs.

GstCUDA is a RidgeRun developed GStreamer plug-in enabling easy CUDA algorithm integration into GStreamer pipelines. GstCUDA offers a framework that allows users to develop custom GStreamer elements that execute any CUDA algorithm. The GstCUDA framework is a series of base classes abstracting the complexity of both CUDA and GStreamer. With GstCUDA, developers avoid writing elements from scratch, allowing the developer to focus on the algorithm logic, thus accelerating time to market.

GstCUDA offers a GStreamer plugin that contains a set of elements, that are ideal for GStreamer/CUDA quick prototyping. Those elements consist in a set of filters with different input/output pads combinations, that are run-time loadable with an external custom CUDA library that contains the algorithm to be executed on the GPU on each video frame that passes through the pipeline. GstCUDA plugin allows users to develop their own CUDA processing library, pass the library into the GstCUDA filter element that best adapts to the algorithm requirements, executes the library on the GPU, passing upstream frames from the GStreamer pipeline to the GPU and passing the modified frames downstream to the next element in the GStreamer pipeline. Those elements were created with the CUDA algorithm developer in mind - supporting quick prototyping and abstracting all GStreamer concepts. The elements are fully adaptable to different project needs, making GstCUDA a powerful tool that is essential for CUDA/GStreamer project development.

One remarkable feature of GstCUDA is that it provides a zero memory copy interface between CUDA and GStreamer on Jetson TX1/TX2 platforms. This enables heavy algorithms and large amounts of data (up to 2x 4K 60fps streams) to be processed on CUDA without the performance caused by copies or memory conversions. GstCUDA provides the necessary APIs to directly handle NVMM buffers to achieve the best possible performance on Jetson TX1/TX2 platforms. It provides a series of base classes and utilities that abstract the complexity of handle memory interface between GStreamer and CUDA, so the developer can focus on what actually gives value to the end product. GstCuda ensures an optimal performance for GStreamer/CUDA applications on Jetson platforms.

You can find detailed information about GstCUDA on the following link:
http://developer.ridgerun.com/wiki/index.php?title=GstCUDA

I hope this information can be useful to you.

Best regards,
-Daniel