During the evaluation of the examples provided in the Jetson Multimedia API, it turned out that the sample program called “15_multivideo_encode” does not behave as expected.
I provided a file containing 10 YUV420 frames at Full HD resolution (1920x1080) and encoded them to a H265 stream. However, when providing two of those streams with the same amount of data, the processing time doubles (I measured the time in the code starting right before the encoder thread creation and ending after all threads are joined).
Different encoder settings did not have an impact on that behaviour. Are there any additional flags needed?
Thank you!
Hi,
Do you use Jetpack 4.5.1? There is a patch for JP4.4:
Jetson/L4T/r32.4.x patches - eLinux.org
[MMAPI]Patch for multi-instance encoding
And it is implemented as an option in JP4.5.1:
-hpt <type> HW preset type (1 = ultrafast, 2 = fast, 3 = medium, 4 = slow)
By default the hardware preset level is ultrafast to give maximum encoding speed.
Hello DaneLLL,
yes I’m using Jetpack 4.5.1. The hardware preset level is set to ultrafast in the code by default.
I’ve already tried playing around with the hardware preset level and other parameters too.
Or are my expectations simply wrong? The “15_multivideo_encode” program should be able to encode two streams in the same time as a single one, if the resolution is not too high (which is not the case), right?
Thank you!
Hi,
Running 15_multivideo_encode reads frames from file and the result may be capped by FIO. Suggest try
gst-launch-1.0 videotestsrc num-buffers=300 ! video/x-raw,width=320,height=240 ! nvvidconv ! 'video/x-raw(memory:NVMM),width=1920,height=1080,format=NV12' ! nvv4l2h265enc maxperf-enable=1 ! fpsdisplaysink text-overlay=0 video-sink=fakesink sync=0 videotestsrc num-buffers=300 ! video/x-raw,width=320,height=240 ! nvvidconv ! 'video/x-raw(memory:NVMM),width=1920,height=1080,format=NV12' ! nvv4l2h265enc maxperf-enable=1 ! fpsdisplaysink text-overlay=0 video-sink=fakesink sync=0 -v
You should see NVENC
and NVENC1
be enabled and shown in sudo tegrastats:
RAM 1738/15817MB (lfb 3190x4MB) SWAP 0/7908MB (cached 0MB) CPU [8%@2265,13%@2265,8%@2265,6%@2265,15%@2265,11%@2265,16%@2265,21%@2265] EMC_FREQ 6%@2133 GR3D_FREQ 0%@1377 NVENC 1075 NVENC1 1075 VIC_FREQ 0%@115 APE 150 MTS fg 0% bg 9% AO@51.5C GPU@53C Tdiode@55.25C PMIC@100C AUX@51.5C CPU@53C thermal@52.85C Tboard@51C GPU 1070/1070 CPU 1681/1101 SOC 4739/2876 CV 0/0 VDDRQ 458/213 SYS5V 2596/2482
Hello DaneLLL,
FIO capping might be the explanation for the behavior, but I could not reproduce your tegrastats output. No NVENC devices are listed in my output while the gst-launch-1.0 job is running. Any ideas on that?
Thank you!
Are you running tegratstats with sudo ?
sudo tegrastats
Sorry for that, I missed the sudo part!
It turned out that both NVENC devices were also used when I just created one thread in the “15_multivideo_encode” program. That’s why my calculations did not meet the expectations.
But nevertheless the timings do not match the official claim of “14x 1080p @ 30 (H.265/H.264)”.
I measure ~8ms per Full HD frame (only ~120fps) and ~55ms for a 5164x3874 frame.
Hi,
Please add the property from 01_video_encode
:
--max-perf Enable maximum Performance
And enable it in the profiling.
Hi again,
I enabled the max-perf setting as you recommended and also evaluated the 01_video_encode program again.
My input data is 1616x1080 (a bit smaller than Full HD) and should therefore be processed in around 3ms per frame ( which equals the official claim “14x Full HD”). Tegrastats shows both NVENC devices are working during the encoding process.
However, the encoding time is around 9ms per frame for a good quality bit rate and even in the default bit rate it takes 7.8ms. I even loaded all frames into the RAM to make sure we are not restricted by FIO and neglected the loading time in my calculations.
Are the official claims only valid for very specific low bitrate settings or are there even more additional flags that we need to pass to the encoder?
Thank you!
Hi,
It still looks like the performance is dominated by FIO in your profiling. Hardware encoder is able to achieve 14x 1080p30 HEVC:
gst-launch-1.0 videotestsrc num-buffers=300 ! video/x-raw,width=320,height=240 ! nvvidconv ! 'video/x-raw(memory:NVMM),width=1616,height=1080,format=NV12' ! nvv4l2h265enc maxperf-enable=1 ! fpsdisplaysink text-overlay=0 video-sink=fakesink sync=0 videotestsrc num-buffers=300 ! video/x-raw,width=320,height=240 ! nvvidconv ! 'video/x-raw(memory:NVMM),width=1616,height=1080,format=NV12' ! nvv4l2h265enc maxperf-enable=1 ! fpsdisplaysink text-overlay=0 video-sink=fakesink sync=0 videotestsrc num-buffers=300 ! video/x-raw,width=320,height=240 ! nvvidconv ! 'video/x-raw(memory:NVMM),width=1616,height=1080,format=NV12' ! nvv4l2h265enc maxperf-enable=1 ! fpsdisplaysink text-overlay=0 video-sink=fakesink sync=0 videotestsrc num-buffers=300 ! video/x-raw,width=320,height=240 ! nvvidconv ! 'video/x-raw(memory:NVMM),width=1616,height=1080,format=NV12' ! nvv4l2h265enc maxperf-enable=1 ! fpsdisplaysink text-overlay=0 video-sink=fakesink sync=0 videotestsrc num-buffers=300 ! video/x-raw,width=320,height=240 ! nvvidconv ! 'video/x-raw(memory:NVMM),width=1616,height=1080,format=NV12' ! nvv4l2h265enc maxperf-enable=1 ! fpsdisplaysink text-overlay=0 video-sink=fakesink sync=0 videotestsrc num-buffers=300 ! video/x-raw,width=320,height=240 ! nvvidconv ! 'video/x-raw(memory:NVMM),width=1616,height=1080,format=NV12' ! nvv4l2h265enc maxperf-enable=1 ! fpsdisplaysink text-overlay=0 video-sink=fakesink sync=0 videotestsrc num-buffers=300 ! video/x-raw,width=320,height=240 ! nvvidconv ! 'video/x-raw(memory:NVMM),width=1616,height=1080,format=NV12' ! nvv4l2h265enc maxperf-enable=1 ! fpsdisplaysink text-overlay=0 video-sink=fakesink sync=0 videotestsrc num-buffers=300 ! video/x-raw,width=320,height=240 ! nvvidconv ! 'video/x-raw(memory:NVMM),width=1616,height=1080,format=NV12' ! nvv4l2h265enc maxperf-enable=1 ! fpsdisplaysink text-overlay=0 video-sink=fakesink sync=0 videotestsrc num-buffers=300 ! video/x-raw,width=320,height=240 ! nvvidconv ! 'video/x-raw(memory:NVMM),width=1616,height=1080,format=NV12' ! nvv4l2h265enc maxperf-enable=1 ! fpsdisplaysink text-overlay=0 video-sink=fakesink sync=0 videotestsrc num-buffers=300 ! video/x-raw,width=320,height=240 ! nvvidconv ! 'video/x-raw(memory:NVMM),width=1616,height=1080,format=NV12' ! nvv4l2h265enc maxperf-enable=1 ! fpsdisplaysink text-overlay=0 video-sink=fakesink sync=0 videotestsrc num-buffers=300 ! video/x-raw,width=320,height=240 ! nvvidconv ! 'video/x-raw(memory:NVMM),width=1616,height=1080,format=NV12' ! nvv4l2h265enc maxperf-enable=1 ! fpsdisplaysink text-overlay=0 video-sink=fakesink sync=0 videotestsrc num-buffers=300 ! video/x-raw,width=320,height=240 ! nvvidconv ! 'video/x-raw(memory:NVMM),width=1616,height=1080,format=NV12' ! nvv4l2h265enc maxperf-enable=1 ! fpsdisplaysink text-overlay=0 video-sink=fakesink sync=0 videotestsrc num-buffers=300 ! video/x-raw,width=320,height=240 ! nvvidconv ! 'video/x-raw(memory:NVMM),width=1616,height=1080,format=NV12' ! nvv4l2h265enc maxperf-enable=1 ! fpsdisplaysink text-overlay=0 video-sink=fakesink sync=0 videotestsrc num-buffers=300 ! video/x-raw,width=320,height=240 ! nvvidconv ! 'video/x-raw(memory:NVMM),width=1616,height=1080,format=NV12' ! nvv4l2h265enc maxperf-enable=1 ! fpsdisplaysink text-overlay=0 video-sink=fakesink sync=0 -v
Hi,
Not sure that this is a realistic benchmark to be honest.
You are using a rather small testpattern of which only roughly 10% is dynamically changing. The remaining 90% of the frame has nearly 0 high-frequency detail (constant color bars) and this is then upscaled to FHD to be encoded. This will lead to unrealistically small bitstreams and not a lot of effort on the encoder neither on the tranform and CABAC side (pretty much only DC coeffs in 90% of the macroblocks) nor on the motion compensation. I think it would be more helpful to assess the performance using a standard test video stream, for example one of the streams found here: http://ultravideo.fi/
Just checked on NX with 2 of the streams you mentionned and see no issues with 14 H265 encoders:
gst-launch-1.0 -v filesrc location=/home/nvidia/Videos/YachtRide_1920x1080_30fps_420_8bit_AVC_MP4.mp4 ! qtdemux ! video/x-h264 ! h264parse ! nvv4l2decoder ! tee name=t \
t. ! queue ! nvv4l2h265enc maxperf-enable=1 ! fpsdisplaysink text-overlay=0 video-sink=fakesink sync=0 \
t. ! queue ! nvv4l2h265enc maxperf-enable=1 ! fpsdisplaysink text-overlay=0 video-sink=fakesink sync=0 \
t. ! queue ! nvv4l2h265enc maxperf-enable=1 ! fpsdisplaysink text-overlay=0 video-sink=fakesink sync=0 \
t. ! queue ! nvv4l2h265enc maxperf-enable=1 ! fpsdisplaysink text-overlay=0 video-sink=fakesink sync=0 \
t. ! queue ! nvv4l2h265enc maxperf-enable=1 ! fpsdisplaysink text-overlay=0 video-sink=fakesink sync=0 \
t. ! queue ! nvv4l2h265enc maxperf-enable=1 ! fpsdisplaysink text-overlay=0 video-sink=fakesink sync=0 \
t. ! queue ! nvv4l2h265enc maxperf-enable=1 ! fpsdisplaysink text-overlay=0 video-sink=fakesink sync=0 \
t. ! queue ! nvv4l2h265enc maxperf-enable=1 ! fpsdisplaysink text-overlay=0 video-sink=fakesink sync=0 \
t. ! queue ! nvv4l2h265enc maxperf-enable=1 ! fpsdisplaysink text-overlay=0 video-sink=fakesink sync=0 \
t. ! queue ! nvv4l2h265enc maxperf-enable=1 ! fpsdisplaysink text-overlay=0 video-sink=fakesink sync=0 \
t. ! queue ! nvv4l2h265enc maxperf-enable=1 ! fpsdisplaysink text-overlay=0 video-sink=fakesink sync=0 \
t. ! queue ! nvv4l2h265enc maxperf-enable=1 ! fpsdisplaysink text-overlay=0 video-sink=fakesink sync=0 \
t. ! queue ! nvv4l2h265enc maxperf-enable=1 ! fpsdisplaysink text-overlay=0 video-sink=fakesink sync=0 \
t. ! queue ! nvv4l2h265enc maxperf-enable=1 ! fpsdisplaysink text-overlay=0 video-sink=fakesink sync=0
It gives a solid 30 fps so far. Using sync=1 is also ok.
Hi Honey_Patouceul,
I could indeed reproduce your results with the same input video.
After a careful reevaluation and some fixes to my 15_multivideo_encode program , it turned out that the encoding times just do not scale linearly if we use only one stream.
For example, it takes ~6.3ms on average to encode a frame of the Full HD yacht video if we only use one stream. Although this does not match the expected 3ms encoding time (420fps @ Full HD), the encoding time is almost the same (~6.5ms) for two streams and then only slowly increases when adding more streams.
Overall, this is a good result.
Thank you for your help!