What I try to do is to decode an h264 video and process the decoded frames with CUDA. Actually I do not even need to visualize the decoded frames.
I can do this using the typical appsink pipeline but the transfers from omxh264dec memory to main memory are very expensive. I want to avoid this, ideally I would like to get the frame from an EGLImage and use the opengl<->cuda interoperability mechanisms. I have had no luck with this.
I see there is a plugin called nvgleglessink which do a very good job, I’m able to visualize 6 fullhd videos without a lot of CPU charge, because the plugin do not incur in the memory transfers overhead.
I would like to get something similar to the nvgleglessink but only for postprocessing the frames, I will try to use the “last-frame” element from this plugin to do my cuda processing but it would be odd, even if it works.
I have seen also there is a gst-cuda plugin but I think its only for TX1.
Please any advice will help me a lot, I have google’ed a lot and can not find anything useful in this scenario.
GstCUDA framework is just what you are looking for. GstCUDA is a RidgeRun developed, GStreamer plug-in and framework enabling easy integration of CUDA algorithms into GStreamer pipelines. GstCUDA offers a framework that allows users to easily develop custom GStreamer elements that executes any CUDA algorithm. The GstCUDA framework is a series of base classes abstracting the complexity of both CUDA and GStreamer. With GstCUDA, developers avoid writing elements from scratch, allowing the developer to focus on the algorithm logic, and accelerating time to market.
One remarkable feature of GstCUDA is that it provides a zero memory copy interface between CUDA and GStreamer on Jetson TX1/TX2 platforms. This enables heavy algorithms and large amounts of data (up to 2x 4K 60fps streams) to be processed on CUDA without affecting the performance due to copies or memory conversions. GstCUDA provides the necessary APIs to directly handle NVMM buffers type to achieve the best possible performance on Jetson TX1/TX2 platforms.
If you are using the HW accelerated decoders of the Tegra platform, you can pass directly their output to GstCUDA. Because GstCUDA cand directly handle NVMM buffers without memory conversions, you can pass the decoder output in NVMM memory type format, to ensure getting the best posible performance on the Tegra platforms.