How to pass data between GStreamer and CUDA without memory copying?

KentaroH · July 21, 2017, 9:33am

Hi folks.

I’m trying to stitch multiple IP camera’s video streams into a single video by GStreamer and CUDA.

There are 3 blocks. The first block receives and decodes video streams by GStreamer. The second block stitches the decoded video frames by CUDA. And last block display the stitched video frame by GStreamer.

My pipeline is as follows.

( rtspsrc ! decodebin ! nvvidconv ! appsink ) --+--> ( CUDA process ) ---> ( appsrc ! nvvidconv ! nvoverlaysink )
                     :                          |
  (same number of pipelines as cameras)         |
                     :                          |
( rtspsrc ! decodebin ! nvvidconv ! appsink ) --+

This pipeline works. However,it has a performance problem about data passings between GStreamer and CUDA because I copy the data to memory space that can be handled by each.

I’d like to avoid the copying, and I guess NVMM makes it unnecessary. But I don’t know how to do it.
Do you have any ideas?

Honey_Patouceul · July 21, 2017, 10:22am

You may have a look to gstreamer nvivafilter plugin. Check [url]https://devtalk.nvidia.com/default/topic/938761/jetson-tx1/how-to-use-the-gstreamer-element-nvivafilter-/post/4986671/#4986671[/url].

KentaroH · July 24, 2017, 12:56am

Thank you for the response.

However, nvivafilter can process only one input stream. I have to process multiple streams to stitch them.

Or is there a good way with nvivafilter to do it?

DaneLLL · July 27, 2017, 9:50am

Hi KentaroH,
gstreamer cannot run your case without memcpy(). We suggest try tegra_multimedia_api.

KentaroH · July 27, 2017, 11:27am

Hi DaneLLL,

OK.
Actually, I was planning to use tegra_multimedia_api as another way not GStreamer.
It’s about time.

Thank you.

dgarba · July 24, 2018, 4:37pm

Hi KentaroH,

You may found interesting the following information about the GstCUDA framework, I think that is exactly what you are looking for.

GstCUDA is a RidgeRun developed GStreamer plug-in enabling easy CUDA algorithm integration into GStreamer pipelines. GstCUDA offers a framework that allows users to develop custom GStreamer elements that execute any CUDA algorithm. The GstCUDA framework is a series of base classes abstracting the complexity of both CUDA and GStreamer. With GstCUDA, developers avoid writing elements from scratch, allowing the developer to focus on the algorithm logic, thus accelerating time to market.

GstCUDA offers a GStreamer plugin that contains a set of elements, that are ideal for GStreamer/CUDA quick prototyping. Those elements consist in a set of filters with different input/output pads combinations, that are run-time loadable with an external custom CUDA library that contains the algorithm to be executed on the GPU on each video frame that passes through the pipeline. GstCUDA plugin allows users to develop their own CUDA processing library, pass the library into the GstCUDA filter element that best adapts to the algorithm requirements, executes the library on the GPU, passing upstream frames from the GStreamer pipeline to the GPU and passing the modified frames downstream to the next element in the GStreamer pipeline. Those elements were created with the CUDA algorithm developer in mind - supporting quick prototyping and abstracting all GStreamer concepts. The elements are fully adaptable to different project needs, making GstCUDA a powerful tool that is essential for CUDA/GStreamer project development.

One remarkable feature of GstCUDA is that it provides a zero memory copy interface between CUDA and GStreamer on Jetson TX1/TX2 platforms. This enables heavy algorithms and large amounts of data (up to 2x 4K 60fps streams) to be processed on CUDA without the performance caused by copies or memory conversions. GstCUDA provides the necessary APIs to directly handle NVMM buffers to achieve the best possible performance on Jetson TX1/TX2 platforms. It provides a series of base classes and utilities that abstract the complexity of handle memory interface between GStreamer and CUDA, so the developer can focus on what actually gives value to the end product. GstCuda ensures an optimal performance for GStreamer/CUDA applications on Jetson platforms.

You can find detailed information about GstCUDA on the following link:
[url]http://developer.ridgerun.com/wiki/index.php?title=GstCUDA[/url]

I hope this information can be useful to you.

Best regards,
-Daniel

Topic		Replies	Views
sharing memory between CUDA and openmax codec(TX1)? or other fast data transfer? Jetson TX1	15	3865	October 18, 2021
Process multiple Gstreamer buffers directly allocated in CUDA memory Jetson TX2	5	3910	October 18, 2021
NVMM memory in custom GStreamer plugin Jetson TX1	11	5891	October 18, 2021
How to use nvcamerasrc with CUDA Jetson TX1	3	1663	October 18, 2021
Retrieve buffer from gstreamer flow when using video/x-raw(memory:NVMM) Jetson TX2	13	5435	October 18, 2021
camera => CUDA => h264 encoding Jetson TX1	9	2363	October 18, 2021
Lane follower algorithm using CUDA C++ without OpenCV Jetson TX2	5	920	October 18, 2021
Passing NVMM frames to Gstreamer appsink to apply custom processing Jetson AGX Xavier cuda , gstreamer	3	2533	October 18, 2021
Creating a GStreamer source that publishes to NVMM Jetson AGX Xavier	12	4514	October 18, 2021
How to use CUDA in gstreamer pipeline Jetson TX2 cuda , gstreamer	4	3389	October 18, 2021

How to pass data between GStreamer and CUDA without memory copying?

Related topics