GPU-accelerated Gstreamer filters?

Can we write Gstreamer filter plugins that process video frames on the GPU (eg: using OpenGL ES or CUDA) generating output that can be used directly by the hardware video encoder (ie: without being transferred through main memory)? Do we just declare the output image as “video/x-raw(memory:NVMM),format=(string)I420” format or there are various other steps & complications?

For example, this command does video decoding + encoding purely in hardware (~60 FPS for 1080p with just 5% CPU usage):

gst-launch-1.0 -e filesrc location=in_1080p25.h264 ! h264parse ! omxh264dec ! queue ! nvvidconv ! 'video/x-raw(memory:NVMM),format=(string)I420' ! omxh264enc bitrate=45000000 insert-sps-pps=true ! 'video/x-h264, stream-format=(string)byte-stream, profile=high' ! h264parse ! filesink location=out.h264

But I want to modify the image on the GPU (eg: using OpenGL ES) before it gets encoded, so imagine I want to create the element “gpu_rotate” that transforms the video file:

gst-launch-1.0 -e filesrc location=in_1080p25.h264 ! h264parse ! omxh264dec ! queue ! gpu_fisheye ! nvvidconv ! 'video/x-raw(memory:NVMM),format=(string)I420' ! omxh264enc bitrate=45000000 insert-sps-pps=true ! 'video/x-h264, stream-format=(string)byte-stream, profile=high' ! h264parse ! filesink location=out.h264

What is required to create a GPU-accelerated filter like this to run atleast 30 FPS by being directly compatible with nvvidconv or omxh264enc?

Cheers,
Shervin Emami.

Hi ShervinE,

Sorry for the late reply.

We have gst-nvivafilter which provides a mechanism for user to process the data on GPU by setting ‘CUDA-process’ property, but it’s not ready in TK1, only supported in TX1 now.

Thanks

Hi ShervinE,

As Kay already mentioned above, currently nvivafilter is not supported with TK1 & only with TX1.

Though following info might help with your query regarding usage of nvivafilter as “GPU-accelerated filter” gst-plugin.

Latest L4T R24.2 public release @ https://developer.nvidia.com/embedded/linux-tegra provides sources of the libsample_process.so library.

You need to download nvsample_cudaprocess_src.tbz2 from source package link of R24.2 release page.

Please refer nvsample_cudaprocess_README.txt for the details of the interface APIs.
Source package also provides Makefile & instructions for on-target compilation.

nvivafilter “cuda-process” property provides decoded video frame access on GPU for post-processing.
Current reference CUDA sample implementation can be replaced with any custom CUDA op.

Following is the example gst-launch pipeline for reference,

gst-launch-1.0 filesrc location=<input_file.mp4> ! qtdemux ! h264parse ! omxh264dec ! nvivafilter customer-lib-name=libnvsample_cudaprocess.so cuda-process=true ! omxh264enc ! h264parse ! qtmux ! filesink location=<output_file.mp4> -e

I hope this will help with your target use-cases.

-Regards,
Amit Pandya

Thanks Amit & Kay,

Yes nvivafilter does sound like what we want to use, thanks for providing it! But we want it for TK1, since TX1 is out of our price range, so I guess we have to stick with CPU processing for now :-(

Hi ShervinE and guy,

i have a test with exact same statement as you mentioned on TK1, and notice that the CPU usage is about 70%, instead of 5%, so is there anything I missed on TK1 to use omxh264enc? this is a big issue for us, due to high CPU consuming, because we are trying to use omxh264enc in our application (with appsrc and appsink) with 150% CPU usage, that is not accepted!

thanks for any suggestion!

“For example, this command does video decoding + encoding purely in hardware (~60 FPS for 1080p with just 5% CPU usage):”

gst-launch-1.0 -e filesrc location=in_1080p25.h264 ! h264parse ! omxh264dec ! queue ! nvvidconv ! 'video/x-raw(memory:NVMM),format=(string)I420' ! omxh264enc bitrate=45000000 insert-sps-pps=true ! 'video/x-h264, stream-format=(string)byte-stream, profile=high' ! h264parse ! filesink location=out.h264

Best Regards.
-zhi

Hi iamsyt,

It seems you already filed same issue in a new topic, and we did the post just now, see below update:

[url]https://devtalk.nvidia.com/default/topic/978001/jetson-tk1/high-cpu-usage-using-omxh264enc-on-tk1/post/5024882/#5024882[/url]

Thanks

thank you, kayccc