Simultaneous use of two HEVC encoders on a Jetson Xavier NX

david139 · May 11, 2021, 7:16pm

During the evaluation of the examples provided in the Jetson Multimedia API, it turned out that the sample program called “15_multivideo_encode” does not behave as expected.
I provided a file containing 10 YUV420 frames at Full HD resolution (1920x1080) and encoded them to a H265 stream. However, when providing two of those streams with the same amount of data, the processing time doubles (I measured the time in the code starting right before the encoder thread creation and ending after all threads are joined).
Different encoder settings did not have an impact on that behaviour. Are there any additional flags needed?
Thank you!

DaneLLL · May 12, 2021, 1:41am

Hi,
Do you use Jetpack 4.5.1? There is a patch for JP4.4:
Jetson/L4T/r32.4.x patches - eLinux.org
[MMAPI]Patch for multi-instance encoding

And it is implemented as an option in JP4.5.1:

        -hpt <type>           HW preset type (1 = ultrafast, 2 = fast, 3 = medium,  4 = slow)

By default the hardware preset level is ultrafast to give maximum encoding speed.

david139 · May 12, 2021, 5:17pm

Hello DaneLLL,

yes I’m using Jetpack 4.5.1. The hardware preset level is set to ultrafast in the code by default.
I’ve already tried playing around with the hardware preset level and other parameters too.
Or are my expectations simply wrong? The “15_multivideo_encode” program should be able to encode two streams in the same time as a single one, if the resolution is not too high (which is not the case), right?
Thank you!

DaneLLL · May 13, 2021, 2:52am

Hi,
Running 15_multivideo_encode reads frames from file and the result may be capped by FIO. Suggest try

gst-launch-1.0 videotestsrc num-buffers=300 ! video/x-raw,width=320,height=240 ! nvvidconv ! 'video/x-raw(memory:NVMM),width=1920,height=1080,format=NV12' ! nvv4l2h265enc maxperf-enable=1 ! fpsdisplaysink text-overlay=0 video-sink=fakesink sync=0 videotestsrc num-buffers=300 ! video/x-raw,width=320,height=240 ! nvvidconv ! 'video/x-raw(memory:NVMM),width=1920,height=1080,format=NV12' ! nvv4l2h265enc maxperf-enable=1 ! fpsdisplaysink text-overlay=0 video-sink=fakesink sync=0 -v

You should see NVENC and NVENC1 be enabled and shown in sudo tegrastats:

RAM 1738/15817MB (lfb 3190x4MB) SWAP 0/7908MB (cached 0MB) CPU [8%@2265,13%@2265,8%@2265,6%@2265,15%@2265,11%@2265,16%@2265,21%@2265] EMC_FREQ 6%@2133 GR3D_FREQ 0%@1377 NVENC 1075 NVENC1 1075 VIC_FREQ 0%@115 APE 150 MTS fg 0% bg 9% AO@51.5C GPU@53C Tdiode@55.25C PMIC@100C AUX@51.5C CPU@53C thermal@52.85C Tboard@51C GPU 1070/1070 CPU 1681/1101 SOC 4739/2876 CV 0/0 VDDRQ 458/213 SYS5V 2596/2482

david139 · May 17, 2021, 4:58pm

Hello DaneLLL,

FIO capping might be the explanation for the behavior, but I could not reproduce your tegrastats output. No NVENC devices are listed in my output while the gst-launch-1.0 job is running. Any ideas on that?

Thank you!

Honey_Patouceul · May 17, 2021, 5:30pm

Are you running tegratstats with sudo ?

sudo tegrastats

david139 · May 17, 2021, 8:48pm

Sorry for that, I missed the sudo part!

It turned out that both NVENC devices were also used when I just created one thread in the “15_multivideo_encode” program. That’s why my calculations did not meet the expectations.
But nevertheless the timings do not match the official claim of “14x 1080p @ 30 (H.265/H.264)”.
I measure ~8ms per Full HD frame (only ~120fps) and ~55ms for a 5164x3874 frame.

DaneLLL · May 18, 2021, 2:27am

Hi,
Please add the property from 01_video_encode:

        --max-perf            Enable maximum Performance

And enable it in the profiling.

david139 · May 24, 2021, 10:33pm

Hi again,

I enabled the max-perf setting as you recommended and also evaluated the 01_video_encode program again.
My input data is 1616x1080 (a bit smaller than Full HD) and should therefore be processed in around 3ms per frame ( which equals the official claim “14x Full HD”). Tegrastats shows both NVENC devices are working during the encoding process.
However, the encoding time is around 9ms per frame for a good quality bit rate and even in the default bit rate it takes 7.8ms. I even loaded all frames into the RAM to make sure we are not restricted by FIO and neglected the loading time in my calculations.
Are the official claims only valid for very specific low bitrate settings or are there even more additional flags that we need to pass to the encoder?

Thank you!

DaneLLL · May 26, 2021, 3:39am

Hi,
It still looks like the performance is dominated by FIO in your profiling. Hardware encoder is able to achieve 14x 1080p30 HEVC:

gst-launch-1.0 videotestsrc num-buffers=300 ! video/x-raw,width=320,height=240 ! nvvidconv ! 'video/x-raw(memory:NVMM),width=1616,height=1080,format=NV12' ! nvv4l2h265enc maxperf-enable=1 ! fpsdisplaysink text-overlay=0 video-sink=fakesink sync=0 videotestsrc num-buffers=300 ! video/x-raw,width=320,height=240 ! nvvidconv ! 'video/x-raw(memory:NVMM),width=1616,height=1080,format=NV12' ! nvv4l2h265enc maxperf-enable=1 ! fpsdisplaysink text-overlay=0 video-sink=fakesink sync=0 videotestsrc num-buffers=300 ! video/x-raw,width=320,height=240 ! nvvidconv ! 'video/x-raw(memory:NVMM),width=1616,height=1080,format=NV12' ! nvv4l2h265enc maxperf-enable=1 ! fpsdisplaysink text-overlay=0 video-sink=fakesink sync=0 videotestsrc num-buffers=300 ! video/x-raw,width=320,height=240 ! nvvidconv ! 'video/x-raw(memory:NVMM),width=1616,height=1080,format=NV12' ! nvv4l2h265enc maxperf-enable=1 ! fpsdisplaysink text-overlay=0 video-sink=fakesink sync=0 videotestsrc num-buffers=300 ! video/x-raw,width=320,height=240 ! nvvidconv ! 'video/x-raw(memory:NVMM),width=1616,height=1080,format=NV12' ! nvv4l2h265enc maxperf-enable=1 ! fpsdisplaysink text-overlay=0 video-sink=fakesink sync=0 videotestsrc num-buffers=300 ! video/x-raw,width=320,height=240 ! nvvidconv ! 'video/x-raw(memory:NVMM),width=1616,height=1080,format=NV12' ! nvv4l2h265enc maxperf-enable=1 ! fpsdisplaysink text-overlay=0 video-sink=fakesink sync=0 videotestsrc num-buffers=300 ! video/x-raw,width=320,height=240 ! nvvidconv ! 'video/x-raw(memory:NVMM),width=1616,height=1080,format=NV12' ! nvv4l2h265enc maxperf-enable=1 ! fpsdisplaysink text-overlay=0 video-sink=fakesink sync=0 videotestsrc num-buffers=300 ! video/x-raw,width=320,height=240 ! nvvidconv ! 'video/x-raw(memory:NVMM),width=1616,height=1080,format=NV12' ! nvv4l2h265enc maxperf-enable=1 ! fpsdisplaysink text-overlay=0 video-sink=fakesink sync=0 videotestsrc num-buffers=300 ! video/x-raw,width=320,height=240 ! nvvidconv ! 'video/x-raw(memory:NVMM),width=1616,height=1080,format=NV12' ! nvv4l2h265enc maxperf-enable=1 ! fpsdisplaysink text-overlay=0 video-sink=fakesink sync=0 videotestsrc num-buffers=300 ! video/x-raw,width=320,height=240 ! nvvidconv ! 'video/x-raw(memory:NVMM),width=1616,height=1080,format=NV12' ! nvv4l2h265enc maxperf-enable=1 ! fpsdisplaysink text-overlay=0 video-sink=fakesink sync=0 videotestsrc num-buffers=300 ! video/x-raw,width=320,height=240 ! nvvidconv ! 'video/x-raw(memory:NVMM),width=1616,height=1080,format=NV12' ! nvv4l2h265enc maxperf-enable=1 ! fpsdisplaysink text-overlay=0 video-sink=fakesink sync=0 videotestsrc num-buffers=300 ! video/x-raw,width=320,height=240 ! nvvidconv ! 'video/x-raw(memory:NVMM),width=1616,height=1080,format=NV12' ! nvv4l2h265enc maxperf-enable=1 ! fpsdisplaysink text-overlay=0 video-sink=fakesink sync=0 videotestsrc num-buffers=300 ! video/x-raw,width=320,height=240 ! nvvidconv ! 'video/x-raw(memory:NVMM),width=1616,height=1080,format=NV12' ! nvv4l2h265enc maxperf-enable=1 ! fpsdisplaysink text-overlay=0 video-sink=fakesink sync=0 videotestsrc num-buffers=300 ! video/x-raw,width=320,height=240 ! nvvidconv ! 'video/x-raw(memory:NVMM),width=1616,height=1080,format=NV12' ! nvv4l2h265enc maxperf-enable=1 ! fpsdisplaysink text-overlay=0 video-sink=fakesink sync=0 -v

martin.wawro · May 26, 2021, 5:06am

Hi,

Not sure that this is a realistic benchmark to be honest.

You are using a rather small testpattern of which only roughly 10% is dynamically changing. The remaining 90% of the frame has nearly 0 high-frequency detail (constant color bars) and this is then upscaled to FHD to be encoded. This will lead to unrealistically small bitstreams and not a lot of effort on the encoder neither on the tranform and CABAC side (pretty much only DC coeffs in 90% of the macroblocks) nor on the motion compensation. I think it would be more helpful to assess the performance using a standard test video stream, for example one of the streams found here: http://ultravideo.fi/

Honey_Patouceul · May 26, 2021, 6:44pm

Just checked on NX with 2 of the streams you mentionned and see no issues with 14 H265 encoders:

gst-launch-1.0 -v filesrc location=/home/nvidia/Videos/YachtRide_1920x1080_30fps_420_8bit_AVC_MP4.mp4 ! qtdemux ! video/x-h264 ! h264parse ! nvv4l2decoder ! tee name=t \
 t. ! queue ! nvv4l2h265enc maxperf-enable=1 ! fpsdisplaysink text-overlay=0 video-sink=fakesink sync=0 \
 t. ! queue ! nvv4l2h265enc maxperf-enable=1 ! fpsdisplaysink text-overlay=0 video-sink=fakesink sync=0 \
 t. ! queue ! nvv4l2h265enc maxperf-enable=1 ! fpsdisplaysink text-overlay=0 video-sink=fakesink sync=0 \
 t. ! queue ! nvv4l2h265enc maxperf-enable=1 ! fpsdisplaysink text-overlay=0 video-sink=fakesink sync=0 \
 t. ! queue ! nvv4l2h265enc maxperf-enable=1 ! fpsdisplaysink text-overlay=0 video-sink=fakesink sync=0 \
 t. ! queue ! nvv4l2h265enc maxperf-enable=1 ! fpsdisplaysink text-overlay=0 video-sink=fakesink sync=0 \
 t. ! queue ! nvv4l2h265enc maxperf-enable=1 ! fpsdisplaysink text-overlay=0 video-sink=fakesink sync=0 \
 t. ! queue ! nvv4l2h265enc maxperf-enable=1 ! fpsdisplaysink text-overlay=0 video-sink=fakesink sync=0 \
 t. ! queue ! nvv4l2h265enc maxperf-enable=1 ! fpsdisplaysink text-overlay=0 video-sink=fakesink sync=0 \
 t. ! queue ! nvv4l2h265enc maxperf-enable=1 ! fpsdisplaysink text-overlay=0 video-sink=fakesink sync=0 \
 t. ! queue ! nvv4l2h265enc maxperf-enable=1 ! fpsdisplaysink text-overlay=0 video-sink=fakesink sync=0 \
 t. ! queue ! nvv4l2h265enc maxperf-enable=1 ! fpsdisplaysink text-overlay=0 video-sink=fakesink sync=0 \
 t. ! queue ! nvv4l2h265enc maxperf-enable=1 ! fpsdisplaysink text-overlay=0 video-sink=fakesink sync=0 \
 t. ! queue ! nvv4l2h265enc maxperf-enable=1 ! fpsdisplaysink text-overlay=0 video-sink=fakesink sync=0

It gives a solid 30 fps so far. Using sync=1 is also ok.

david139 · May 27, 2021, 5:58pm

Hi Honey_Patouceul,
I could indeed reproduce your results with the same input video.
After a careful reevaluation and some fixes to my 15_multivideo_encode program , it turned out that the encoding times just do not scale linearly if we use only one stream.
For example, it takes ~6.3ms on average to encode a frame of the Full HD yacht video if we only use one stream. Although this does not match the expected 3ms encoding time (420fps @ Full HD), the encoding time is almost the same (~6.5ms) for two streams and then only slowly increases when adding more streams.

Overall, this is a good result.
Thank you for your help!

Topic		Replies	Views
Jetson AGX h.265 encode latency Jetson AGX Xavier mmapi	12	1322	May 23, 2024
Encoding annomalies on Jetson TX2 Jetson TX2 encoder	17	819	October 18, 2021
Multimedia API Jetson AGX Xavier mmapi	5	1175	October 18, 2021
Multi-video encoding performance of TX2 NX Jetson TX2 encoder	5	1485	September 5, 2021
Clarification on TX2 Max HW Encode Resolution Jetson TX2	22	4180	October 18, 2021
Multiple H.265 video encoding Jetson TX1	7	3989	May 19, 2017
Encoding speed Jetson AGX Orin encoder	18	1416	March 26, 2024
Encoding performance issue on Xavier AGX, but no problem on Nano (simultaneous encoding) Jetson AGX Xavier gstreamer , nvbugs	7	1753	September 19, 2022
video encode speed Jetson TX2	19	4890	October 18, 2021
xavier encode and decode do not match official description Jetson AGX Xavier	3	1201	October 18, 2021

Simultaneous use of two HEVC encoders on a Jetson Xavier NX

Related topics