HW accelerated JPEG encoding?

eba · December 15, 2015, 11:55pm

It looks like the following gstreamer jpeg encoding plugins are available: jpegenc, nvjpegenc, nv_omx_jpegenc

The multimedia user guide:

Does not specifically describe what the differences are. Can anyone shed light on the differences between them?

In test code I wrote, it looks like the time to encode a 640x480 grayscale image is ~2ms on the tegra X1 regardless of whether jpegenc or nvjpegenc is used, and both max out the CPU while encoding which makes me think that the encoding is NOT being hardware accelerated. I have not been able to get the nv_omx_jpegenc plugin to work.

jpegenc and nvjpegenc work with the following simple pipline:

$ gst-launch-0.10 filesrc location=test_in.jpg ! jpegdec ! jpegenc ! filesink location=test_out.jpg -e

but nv_omx_jpeg_enc does not.

$ gst-launch-0.10 filesrc location=test_in.jpg ! jpegdec ! nv_omx_jpegenc ! filesink location=test_out.jpg -e
Inside NvxLiteH264DecoderLowLatencyInitNvxLiteH264DecoderLowLatencyInit set DPB and MjstreamingInside NvxLiteH265DecoderLowLatencyInitNvxLiteH265DecoderLowLatencyInit set DPB and MjstreamingSetting pipeline to PAUSED ...
Pipeline is PREROLLING ...
ERROR: from element /GstPipeline:pipeline0/GstFileSrc:filesrc0: Internal data flow error.
Additional debug info:
gstbasesrc.c(2625): gst_base_src_loop (): /GstPipeline:pipeline0/GstFileSrc:filesrc0:
streaming task paused, reason not-negotiated (-4)
ERROR: pipeline doesn't want to preroll.
Setting pipeline to NULL ...
Freeing pipeline ...

dusty_nv · December 19, 2015, 4:30pm

Thanks for your post, the encoder team has been investigating your report and will respond soon.

Best regards,
Dusty

apandya · December 21, 2015, 5:19am

Hi,

Please find details of the available jpeg encode plugins :

jpegenc : OSS jpeg encode plugin (sw encode)
nvjpegenc : nvidia accelerated jpeg encode (hw encode)
nv_omx_jpegenc : OSS gst-openmax jpeg encode plugin (for gst-0.10, not recommended)

Use following pipeline for gst-0.10,

gst-launch-0.10 filesrc location=test_in.jpg ! nvjpegdec ! nvjpegenc ! filesink location=test_out.jpg -e

Also, recommend to use gstreamer-1.0 instead using following pipeline.

gst-launch-1.0 filesrc location=test_in.jpg ! nvjpegdec ! nvjpegenc ! filesink location=test_out.jpg -e

eba · December 23, 2015, 11:46pm

Thank you for explaining the differences. It appears my original understanding was correct.

Unfortunately, I am still seeing the strange performance difference between jpegenc and nvjpegenc… nvjpegenc takes twice as much CPU as jpegenc.

I’ve tested with both my own code, and using stock gstreamer pipelines.
In both tests I used tegrastats to monitor, and had run the max_perf script posted here:

Using jpegenc

gst-launch-0.10 videotestsrc is-live=true ! video/x-raw-rgb, framerate=30/1, width=640, height=480 ! jpegenc, quality=90 ! fakesink

RAM 854/3854MB (lfb 2x4MB) SWAP 0/0MB (cached 0MB) cpu [5%,5%,3%,27%]@1912 EMC 3%@1600 AVP 53%@12 VDE 0 GR3D 0%@998 EDP limit 1912

Using nvjpegenc

gst-launch-0.10 videotestsrc is-live=true ! video/x-raw-rgb, framerate=30/1, width=640, height=480 ! nvjpegenc, quality=90 ! fakesink

RAM 854/3854MB (lfb 2x4MB) SWAP 0/0MB (cached 0MB) cpu [7%,4%,54%,1%]@1912 EMC 2%@1600 AVP 32%@12 VDE 0 GR3D 0%@998 EDP limit 1912

dusty_nv · December 28, 2015, 10:15pm

If possible, we recommend to use gstreamer-1.0. Mentioned higher CPU usage is because of raw->nv format conversion.
If we can pass NVMM buffers (NV format) to encoder then we will not see higher CPU load. Following experiment will help understand better.

I have generated a mjpeg file with 3000 buffers. This file is decoded using OSS jpegdec. Output of jpegdec is fed to jpegenc.

gst-launch-1.0 filesrc location= enc640x480_mjpeg_3000.mp4 ! qtdemux ! jpegdec ! jpegenc ! filesink location=test_out.jpg -e

Tegrastats results below:

RAM 448/3854MB (lfb 733x4MB) SWAP 0/0MB (cached 0MB) cpu [0%,100%,0%,0%]@1912 EMC 4%@1600 AVP 0%@80 VDE 0 GR3D 0%@76 EDP limit 1912
RAM 448/3854MB (lfb 733x4MB) SWAP 0/0MB (cached 0MB) cpu [0%,100%,0%,0%]@1912 EMC 4%@1600 AVP 0%@80 VDE 0 GR3D 0%@76 EDP limit 1912
RAM 448/3854MB (lfb 733x4MB) SWAP 0/0MB (cached 0MB) cpu [0%,100%,0%,0%]@1912 EMC 4%@1600 AVP 0%@80 VDE 0 GR3D 0%@76 EDP limit 1912
RAM 448/3854MB (lfb 733x4MB) SWAP 0/0MB (cached 0MB) cpu [3%,100%,0%,0%]@1912 EMC 4%@1600 AVP 0%@80 VDE 0 GR3D 0%@76 EDP limit 1912
RAM 448/3854MB (lfb 733x4MB) SWAP 0/0MB (cached 0MB) cpu [1%,100%,0%,0%]@1912 EMC 4%@1600 AVP 0%@80 VDE 0 GR3D 0%@76 EDP limit 1912
RAM 448/3854MB (lfb 733x4MB) SWAP 0/0MB (cached 0MB) cpu [0%,100%,0%,0%]@1912 EMC 4%@1600 AVP 0%@80 VDE 0 GR3D 0%@76 EDP limit 1912
RAM 448/3854MB (lfb 733x4MB) SWAP 0/0MB (cached 0MB) cpu [1%,100%,0%,0%]@1912 EMC 4%@1600 AVP 0%@80 VDE 0 GR3D 0%@76 EDP limit 1912
RAM 449/3854MB (lfb 733x4MB) SWAP 0/0MB (cached 0MB) cpu [0%,100%,0%,1%]@1912 EMC 4%@1600 AVP 0%@80 VDE 0 GR3D 0%@76 EDP limit 1912
RAM 449/3854MB (lfb 733x4MB) SWAP 0/0MB (cached 0MB) cpu [0%,100%,0%,0%]@1912 EMC 4%@1600 AVP 0%@80 VDE 0 GR3D 0%@76 EDP limit 1912

Now the same file is decoded using nvjpegdec. Output of this is NVMM buffer which is fed to nvjpegenc.
gst-launch-1.0 filesrc location= enc640x480_mjpeg_3000.mp4 ! qtdemux ! nvjpegdec ! nvjpegenc ! filesink location=test_out.jpg -v -e

Tegrastats results below:

RAM 447/3854MB (lfb 732x4MB) SWAP 0/0MB (cached 0MB) cpu [3%,54%,0%,0%]@825 EMC 14%@665 AVP 1%@80 VDE 0 GR3D 0%@76 EDP limit 1912
RAM 446/3854MB (lfb 732x4MB) SWAP 0/0MB (cached 0MB) cpu [7%,3%,54%,0%]@921 EMC 14%@665 AVP 0%@80 VDE 0 GR3D 0%@76 EDP limit 1912
RAM 447/3854MB (lfb 732x4MB) SWAP 0/0MB (cached 0MB) cpu [4%,54%,0%,0%]@825 EMC 14%@665 AVP 0%@80 VDE 0 GR3D 0%@76 EDP limit 1912
RAM 448/3854MB (lfb 732x4MB) SWAP 0/0MB (cached 0MB) cpu [2%,56%,0%,0%]@825 EMC 14%@665 AVP 0%@80 VDE 0 GR3D 0%@76 EDP limit 1912
RAM 448/3854MB (lfb 732x4MB) SWAP 0/0MB (cached 0MB) cpu [4%,32%,23%,1%]@825 EMC 14%@665 AVP 0%@80 VDE 0 GR3D 0%@76 EDP limit 1912
RAM 446/3854MB (lfb 731x4MB) SWAP 0/0MB (cached 0MB) cpu [14%,22%,0%,23%]@204 EMC 14%@665 AVP 0%@115 VDE 0 GR3D 0%@76 EDP limit 1912

As an be seen if NVMM buffers are provided to nvjpegenc, then we do not increase CPU utilization. In summary, the higher CPU load is because of the raw->nv conversion. This is expected result.

eba · December 30, 2015, 7:43pm

Is there somewhere that I can find documentation on the NVMM buffers/formats?

The goal I’m trying to get towards is to leverage the HW compression from a custom application.

appsrc ! nvjpegenc ! appsink

The gst-launch pipelines were just for simple benchmarking… and the performance results matched my custom code.

dusty_nv · December 31, 2015, 3:56pm

If you download the gstomx-1.0 source from here: [url]http://developer.download.nvidia.com/embedded/L4T/r23_Release_v1.0/source/gstomx1_src.tbz2[/url]
Search it for the terms like: ‘nvbuffer’, ‘nv_buffer’, ‘nvmm’, and ‘CUDA’. Will see if I can find a doc.

DirtBikeDave · January 8, 2016, 12:12am

Dusty, any luck finding documentation on the NVMM buffers/formats?

dusty_nv · January 10, 2016, 3:10pm

Not yet, although we do have the request in with engineering for more info or docs about using NVMM, will follow-up this week.

CPeppenster · January 19, 2016, 9:10am

I am also very interested in this documentation, especially in the difference between video/x-raw and video/x-raw(memory:NVMM).

DirtBikeDave · February 1, 2016, 5:50pm

Dusty, any luck finding documentation on the NVMM buffers/formats?

eba · February 23, 2016, 6:48am

I saw that the nvidia developer site has sources for gstjpeg

http://developer.download.nvidia.com/embedded/L4T/r23_Release_v1.0/source/gstjpeg_src.tbz2

Is this the source for the nvjpeg plugin? It looks like it is.

Browsing through the source, it appears that if the input data is not in an NVMM buffer then it does a normal S/W encode rather converting to nvmm, which might explain the behavior I’ve been seeing. Is this correct?

DavidSoto-RidgeRun · March 20, 2016, 6:25pm

Hi,

I am not sure about NVMM but normally when using a HW encoder/decoder they require that the memory has to be aligned to some number and it must be contiguous memory as well and for that reason you cannot use a kernel allocator to get that memory (unless it comes from framebuffer). I need to read the source code of the plugin but my guess is that it is allocating the NVMM buffers from a dedicated heap or using their own memory allocator. dusty_nv is this the case? 

 When a buffer that is not NVMM is received by the element likely the system needs to do a memory copy to a NVMM buffer, causing overhead, even worse if you test with videotestsrc which needs to generate the pattern. In gstreamer 1.0 you can provide a memory allocator for the plugin, there is a property on the elements called peer-alloc that you can set in true so the element downstream could provide this NVMM memory to storage the data but on this case nvjpegenc should support the functions that gstreamer will call to request that memory.

Short answer, in the following pipeline:

appsrc ! nvjpegenc ! appsink

You need to be sure that the memory pushed in the appsink is NVMM memory.

-David

Topic		Replies	Views
nvjpegdec slower then jpegdec in gstreamer Jetson AGX Xavier	38	8510	October 18, 2021
What does the Nvidia accelerated JPEG plugin for gstreamer do? Jetson TK1	9	5332	July 27, 2015
Seeking suggestions of faster way of saving images with jetson-inference camera processing Jetson TX2	17	5170	October 18, 2021
Two questions about jpeg encode? DeepStream SDK nvbugs	29	1759	October 12, 2021
GPU Acceleration Support for OpenCV Gstreamer Pipeline Jetson Xavier NX opencv , gstreamer	17	8209	October 18, 2021
TX1 gstreamer nvvidconv will not pass out of NVMM memory Jetson TX1	8	7541	October 18, 2021
Hardware re-encode MJPG to H.264 Jetson Nano	20	3013	October 14, 2021
Jetson goes curling (or, simultaneously viewing multiple IP-cams) Jetson TK1	33	12070	November 28, 2014
Deepstream read from folder of images DeepStream SDK	4	2002	September 27, 2021
Gst-launch-1.0 decode jpeg image when using nvjpegdec or jpegdec, but does not work with decodebin Jetson Nano gstreamer	9	2642	October 18, 2021

HW accelerated JPEG encoding?

Related topics