GPU-accelerated Gstreamer filters?

ShervinE1 · May 25, 2016, 2:31pm

Can we write Gstreamer filter plugins that process video frames on the GPU (eg: using OpenGL ES or CUDA) generating output that can be used directly by the hardware video encoder (ie: without being transferred through main memory)? Do we just declare the output image as “video/x-raw(memory:NVMM),format=(string)I420” format or there are various other steps & complications?

For example, this command does video decoding + encoding purely in hardware (~60 FPS for 1080p with just 5% CPU usage):

gst-launch-1.0 -e filesrc location=in_1080p25.h264 ! h264parse ! omxh264dec ! queue ! nvvidconv ! 'video/x-raw(memory:NVMM),format=(string)I420' ! omxh264enc bitrate=45000000 insert-sps-pps=true ! 'video/x-h264, stream-format=(string)byte-stream, profile=high' ! h264parse ! filesink location=out.h264

But I want to modify the image on the GPU (eg: using OpenGL ES) before it gets encoded, so imagine I want to create the element “gpu_rotate” that transforms the video file:

gst-launch-1.0 -e filesrc location=in_1080p25.h264 ! h264parse ! omxh264dec ! queue ! gpu_fisheye ! nvvidconv ! 'video/x-raw(memory:NVMM),format=(string)I420' ! omxh264enc bitrate=45000000 insert-sps-pps=true ! 'video/x-h264, stream-format=(string)byte-stream, profile=high' ! h264parse ! filesink location=out.h264

What is required to create a GPU-accelerated filter like this to run atleast 30 FPS by being directly compatible with nvvidconv or omxh264enc?

Cheers,
Shervin Emami.

kayccc · June 3, 2016, 6:07am

Hi ShervinE,

Sorry for the late reply.

We have gst-nvivafilter which provides a mechanism for user to process the data on GPU by setting ‘CUDA-process’ property, but it’s not ready in TK1, only supported in TX1 now.

Thanks

apandya · September 30, 2016, 4:20am

Hi ShervinE,

As Kay already mentioned above, currently nvivafilter is not supported with TK1 & only with TX1.

Though following info might help with your query regarding usage of nvivafilter as “GPU-accelerated filter” gst-plugin.

Latest L4T R24.2 public release @ https://developer.nvidia.com/embedded/linux-tegra provides sources of the libsample_process.so library.

You need to download nvsample_cudaprocess_src.tbz2 from source package link of R24.2 release page.

Please refer nvsample_cudaprocess_README.txt for the details of the interface APIs.
Source package also provides Makefile & instructions for on-target compilation.

nvivafilter “cuda-process” property provides decoded video frame access on GPU for post-processing.
Current reference CUDA sample implementation can be replaced with any custom CUDA op.

Following is the example gst-launch pipeline for reference,

gst-launch-1.0 filesrc location=<input_file.mp4> ! qtdemux ! h264parse ! omxh264dec ! nvivafilter customer-lib-name=libnvsample_cudaprocess.so cuda-process=true ! omxh264enc ! h264parse ! qtmux ! filesink location=<output_file.mp4> -e

I hope this will help with your target use-cases.

-Regards,
Amit Pandya

ShervinE1 · September 30, 2016, 4:42am

Thanks Amit & Kay,

Yes nvivafilter does sound like what we want to use, thanks for providing it! But we want it for TK1, since TX1 is out of our price range, so I guess we have to stick with CPU processing for now :-(

iamsyt · November 21, 2016, 2:01pm

Hi ShervinE and guy,

i have a test with exact same statement as you mentioned on TK1, and notice that the CPU usage is about 70%, instead of 5%, so is there anything I missed on TK1 to use omxh264enc? this is a big issue for us, due to high CPU consuming, because we are trying to use omxh264enc in our application (with appsrc and appsink) with 150% CPU usage, that is not accepted!

thanks for any suggestion!

“For example, this command does video decoding + encoding purely in hardware (~60 FPS for 1080p with just 5% CPU usage):”

gst-launch-1.0 -e filesrc location=in_1080p25.h264 ! h264parse ! omxh264dec ! queue ! nvvidconv ! 'video/x-raw(memory:NVMM),format=(string)I420' ! omxh264enc bitrate=45000000 insert-sps-pps=true ! 'video/x-h264, stream-format=(string)byte-stream, profile=high' ! h264parse ! filesink location=out.h264

Best Regards.
-zhi

kayccc · November 24, 2016, 5:01am

Hi iamsyt,

It seems you already filed same issue in a new topic, and we did the post just now, see below update:

[url]https://devtalk.nvidia.com/default/topic/978001/jetson-tk1/high-cpu-usage-using-omxh264enc-on-tk1/post/5024882/#5024882[/url]

Thanks

iamsyt · November 25, 2016, 1:59am

thank you, kayccc

Topic		Replies	Views
GStreamer 1.0 performance issue on TK1 Jetson TK1	5	2645	July 27, 2015
Hardware-accelerated video encoding with gstreamer Jetson TX1	9	11949	October 18, 2021
How to use CUDA in gstreamer pipeline Jetson TX2 cuda , gstreamer	4	3635	October 18, 2021
sharing memory between CUDA and openmax codec(TX1)? or other fast data transfer? Jetson TX1	15	3920	October 18, 2021
Gstreamer gst-nvivafilter Jetson TX1	4	1418	October 18, 2021
Jetson TK1 - VLC hardware-accelerated decoding? Jetson TK1	6	5813	August 7, 2014
NVIVA Filter Jetson TX1	3	2216	October 18, 2021
camera => CUDA => h264 encoding Jetson TX1	9	2415	October 18, 2021
Opengl video data ecoder H264 H265 using cuda or Gstreamer Jetson TX1	1	1653	August 16, 2016
Jetson TK1: NVENC + NVCUVID GPU-Accelerated Libraries	0	1453	May 16, 2014

GPU-accelerated Gstreamer filters?

Related topics