Performance issue with nvvidconv and omxhenc265

Hello,
I am using a CTI Rudi, based on Jetson TX2 and Tegra Linux R32.1.

I am using 3 USB cameras, for which I want to generate 3 RTSP streams, using 3 instances of the following gstreamer pipeline (based on the “test-launch” example of RTSP server:

gst_rtsp_media_factory_set_launch (factory_1, "( idsueyesrc config-file=../data/ueye_conf/conf-4cams-ui3881.ini camera-id=1 ! video/x-raw, format=(string)UYVY, width=(int)1080, height=(int)1920, framerate=(fraction)0/1 ! nvvidconv flip-method=1 ! video/x-raw(memory:NVMM), format=(string)NV12 ! omxh265enc bitrate=2000000 ! rtph265pay name=pay0 pt=0 )" );

This works fine at the beginning and I have 3 constant streams at 27FPS on the RTSP client side (the client is simply

gst-launch-1.0 -v playbin uri=rtsp://$IP:8554/stream1 uridecodebin0::source::latency=300 video-sink=fpsdisplaysink

).

Tegrastats shows this at the beginning (NVENC frequency 486MHz):

RAM 1812/7861MB (lfb 680x4MB) CPU [72%@2034,76%@2034,76%@2035,76%@2034,70%@2036,72%@2033] EMC_FREQ 9%@1866 GR3D_FREQ 0%@114 <b>NVENC 486</b> APE 150 MTS fg 0% bg 0% PLL@51C MCPU@51C PMIC@100C Tboard@45C GPU@48.5C BCPU@51C thermal@50.1C Tdiode@47.25C VDD_SYS_GPU 95/95 VDD_SYS_SOC 1820/1601 VDD_4V0_WIFI 76/80 VDD_IN 9467/7910 VDD_SYS_CPU 4598/3469 VDD_SYS_DDR 1735/1519

I installed the gstreamer profiling plugin gstshark, and I am using “GST_TRACERS=“proctime””, so that I can see how much time each gstreamer element takes to process one frame. This is at the beginning when 27FPS are reached:

6ms, 6ms, 39ms to convert one frame:

0:01:15.588064669  5039   0x7f98005770 TRACE             GST_TRACER :0:: proctime, element=(string)nvvconv0, time=(string)0:00:00.005849486;
0:01:15.592437168  5039   0x7f14014450 TRACE             GST_TRACER :0:: proctime, element=(string)nvvconv2, time=(string)0:00:00.006083214;
0:01:15.623016181  5039   0x7f4c010e80 TRACE             GST_TRACER :0:: proctime, element=(string)nvvconv1, time=(string)0:00:00.039421515;

32ms, 12ms, 29ms to encode one frame:

0:01:15.587293407  5039   0x7f04004de0 TRACE             GST_TRACER :0:: proctime, element=(string)omxh265enc-omxh265enc2, time=(string)0:00:00.032952638;
0:01:15.600367224  5039   0x7f740041e0 TRACE             GST_TRACER :0:: proctime, element=(string)omxh265enc-omxh265enc0, time=(string)0:00:00.012177340;
0:01:15.612943699  5039   0x7f50004630 TRACE             GST_TRACER :0:: proctime, element=(string)omxh265enc-omxh265enc1, time=(string)0:00:00.029482665;

However, the frame-rate constantly decreases with time, I waited approximately 20 minutes and it reached 14 fps on the 3 camera streams, also you can see with tegrastats that the frequency of NVENC dropped:

66ms, 66ms, 74ms to convert one frame:

0:21:03.967665686 19061   0x7f04014590 TRACE             GST_TRACER :0:: proctime, element=(string)nvvconv2, time=(string)0:00:00.066681856;
0:21:03.990308854 19061   0x7f3800fd90 TRACE             GST_TRACER :0:: proctime, element=(string)nvvconv1, time=(string)0:00:00.065748482;
 0:21:04.019222924 19061   0x7f88006d40 TRACE             GST_TRACER :0:: proctime, element=(string)nvvconv0, time=(string)0:00:00.074154037;

62ms, 64ms, 62ms to encode one frame:

0:21:03.985219005 19061   0x7f28003720 TRACE             GST_TRACER :0:: proctime, element=(string)omxh265enc-omxh265enc1, time=(string)0:00:00.062565190;
0:21:04.009875706 19061   0x7f60004f70 TRACE             GST_TRACER :0:: proctime, element=(string)omxh265enc-omxh265enc0, time=(string)0:00:00.064911587;
0:21:04.030722556 19061   0x7eec003ca0 TRACE             GST_TRACER :0:: proctime, element=(string)omxh265enc-omxh265enc2, time=(string)0:00:00.062977670;

tegrastats shows low frequency of NVENC (115MHz):

RAM 1837/7861MB (lfb 662x4MB) CPU [51%@2034,59%@2034,51%@2034,49%@2034,47%@2034,46%@2035] EMC_FREQ 6%@1866 GR3D_FREQ 0%@114 NVENC 115 APE 150 MTS fg 0% bg 0% PLL@53C MCPU@53C PMIC@100C Tboard@48C GPU@50.5C BCPU@53C thermal@52.2C Tdiode@49.75C VDD_SYS_GPU 96/95 VDD_SYS_SOC 1632/1651 VDD_4V0_WIFI 420/82 VDD_IN 8213/8184 VDD_SYS_CPU 3312/3620 VDD_SYS_DDR 1563/1581

Is this normal behavior of nvvidconv and omxh265enc? Is NVENC lowering its frequency because of temperature? How can I increase the performance to reach a constant 27FPS stream?

Thanks

Hi,
Please try nvv4l2h265enc and enable

maxperf-enable      : Enable or Disable Max Performance mode
                    flags: readable, writable, changeable only in NULL or READY state
                    Boolean. Default: false

Hi and thanks for your reply,

can nvv4l2h265enc be used with cameras which are not using the v4l2 framework? The cameras I am using do not have a v4l2 driver, the camera manufacturer provides a closed-source userspace library which provides APIs to use the cameras and there is no device /dev/videoX. Because of that as far as I understand nvv4l2h265enc cannot be used.

I tried just replacing “omxh265enc” with “nvv4l2h265enc” in the pipeline, but I do not get a video on the RTSP client when doing this. The new pipeline is:

gst_rtsp_media_factory_set_launch (factory_3, "( idsueyesrc config-file=../data/ueye_conf/conf-4cams-ui3881.ini camera-id=3 ! video/x-raw, format=(string)UYVY, width=(int)1080, height=(int)1920, framerate=(fraction)0/1 ! nvvidconv flip-method=1 ! video/x-raw(memory:NVMM), format=(string)NV12 ! nvv4l2h265enc bitrate=2000000 ! rtph265pay name=pay0 pt=0 )" );

This is what I see on the RTSP server side:

$ ./run.sh 
Scanning dependencies of target start-rtsp-server
[ 33%] Building C object CMakeFiles/start-rtsp-server.dir/start-rtsp-server.o
[ 66%] Linking C executable start-rtsp-server
[100%] Built target start-rtsp-server
nvbuf_utils: Could not get EGL display connection
0:00:00.131930569 23461   0x55a5202270 WARN                     omx gstomx.c:2826:plugin_init: Failed to load configuration file: Valid key file could not be found in search dirs (searched in: /home/taurob/.config:/etc/xdg as per GST_OMX_CONFIG_DIR environment variable, the xdg user config directory (or XDG_CONFIG_HOME) and the system config directory (or XDG_CONFIG_DIRS)
streams ready at rtsp://127.0.0.1:8554/streamX
Failed to query video capabilities: Inappropriate ioctl for device
Opening in BLOCKING MODE 

(start-rtsp-server:23461): GStreamer-CRITICAL **: 09:26:53.990: gst_debug_log_valist: assertion 'category != NULL' failed

(start-rtsp-server:23461): GStreamer-CRITICAL **: 09:26:53.990: gst_debug_log_valist: assertion 'category != NULL' failed

(start-rtsp-server:23461): GStreamer-CRITICAL **: 09:26:53.991: gst_debug_log_valist: assertion 'category != NULL' failed
0:00:05.759424629 23461   0x7fa0005540 FIXME                default gstutils.c:3981:gst_pad_create_stream_id_internal:<idsueyesrc1:src> Creating random stream-id, consider implementing a deterministic way of creating a stream-id
0:00:05.760569490 23461   0x7fa0005540 WARN               nvvidconv gstnvvconv.c:1672:gst_nvvconv_set_caps:<nvvconv1> Cannot keep DAR
NvMMLiteOpen : Block : BlockType = 8 
===== NVMEDIA: NVENC =====
NvMMLiteBlockCreate : Block : BlockType = 8 

(start-rtsp-server:23461): GStreamer-CRITICAL **: 09:26:55.506: gst_debug_log_valist: assertion 'category != NULL' failed
0:00:05.766222564 23461   0x7fa0005540 WARN          v4l2bufferpool gstv4l2bufferpool.c:962:gst_v4l2_buffer_pool_start:<nvv4l2h265enc0:pool:src> Uncertain or not enough buffers, enabling copy threshold
NVMEDIA: H265 : Profile : 1 
0:00:05.900803942 23461   0x7f840038f0 WARN          v4l2bufferpool gstv4l2bufferpool.c:1538:gst_v4l2_buffer_pool_dqbuf:<nvv4l2h265enc0:pool:src> Driver should never set v4l2_buffer.field to ANY
0:00:05.901290341 23461   0x7f840038f0 WARN          rtpbasepayload gstrtpbasepayload.c:853:gst_rtp_base_payload_negotiate:<pay0> Can't use selected pt 0
0:00:05.902724513 23461   0x55a5250b20 FIXME              rtspmedia rtsp-media.c:3841:gst_rtsp_media_suspend: suspend for dynamic pipelines needs fixing
0:00:05.908095475 23461   0x55a5250b20 FIXME              rtspmedia rtsp-media.c:3841:gst_rtsp_media_suspend: suspend for dynamic pipelines needs fixing
0:00:05.908265555 23461   0x55a5250b20 WARN               rtspmedia rtsp-media.c:3867:gst_rtsp_media_suspend: media 0x7fa8026250 was not prepared
0:00:05.917699867 23461   0x55a5250b20 FIXME             rtspclient rtsp-client.c:1657:handle_play_request:<GstRTSPClient@0x55a525f180> Add support for seek style (null)
0:00:05.918300217 23461   0x55a5250b20 FIXME              rtspmedia rtsp-media.c:2437:gst_rtsp_media_seek_full:<GstRTSPMedia@0x7fa8026250> Handle going back to 0 for none live not seekable streams.
0:00:05.971198960 23461   0x7fa0005540 WARN          v4l2bufferpool gstv4l2bufferpool.c:1483:gst_v4l2_buffer_pool_dqbuf:<nvv4l2h265enc0:pool:sink> V4L2 provided buffer has bytesused 0 which is too small to include data_offset 0
0:00:05.971270448 23461   0x7fa0005540 WARN          v4l2bufferpool gstv4l2bufferpool.c:1483:gst_v4l2_buffer_pool_dqbuf:<nvv4l2h265enc0:pool:sink> V4L2 provided buffer has bytesused 0 which is too small to include data_offset 0
0:00:06.004407674 23461   0x7fa0005540 WARN          v4l2bufferpool gstv4l2bufferpool.c:1483:gst_v4l2_buffer_pool_dqbuf:<nvv4l2h265enc0:pool:sink> V4L2 provided buffer has bytesused 0 which is too small to include data_offset 0
0:00:06.004447545 23461   0x7fa0005540 WARN          v4l2bufferpool gstv4l2bufferpool.c:1483:gst_v4l2_buffer_pool_dqbuf:<nvv4l2h265enc0:pool:sink> V4L2 provided buffer has bytesused 0 which is too small to include data_offset 0

I am trying to apply the patch suggested in https://devtalk.nvidia.com/default/topic/1032771/jetson-tx2/no-encoder-perfomance-improvement-before-after-jetson_clocks-sh/post/5255605/#5255605 which would add this line:

“oEncodeProp.bSetMaxEncClock = TRUE;”

However, I downloaded the sources of “Libgstomx for gstreamer 1-0” (version 32.1 of libgstomx.so is contained in https://developer.nvidia.com/embedded/dlc/l4t-sources-32-1-JAX-TX2 )

The source code does not compile, and has the same build errors which were reported for 24.2.1 in https://devtalk.nvidia.com/default/topic/983587/jetson-tx1/gst-omx-plugin-build-error/post/5059343/

This is the first build error on version 32.1:

gstomxh265enc.c:725:5: error: ISO C90 forbids mixed declarations and code

Therefore I downloaded version 24.2.1 of libgstomx for gstreamer1-0 from https://developer.nvidia.com/embedded/dlc/l4t-sample-root-filesystem-24-2-1 and then applied the “quickfix” patches provided in https://devtalk.nvidia.com/default/topic/983587/jetson-tx1/gst-omx-plugin-build-error/post/5059343/ and also added “oEncodeProp.bSetMaxEncClock = TRUE;” to the h265 encoder.

However the libgstomx plugin 24.2.1 fails to load with this error: “0:00:00.010299436 24640 0x55b68e28c0 ERROR omx gstomx.c:2883:plugin_init: Invalid type name ‘GstOMXVP9Dec’ for element ‘omxvp9dec’ and then when I start RTSP streaming I get “*** stack smashing detected ***: terminated””, so I cannot use version 24.2.1 to test the patch which increases NVENC’s clock.

Can you please provide buildable LGPL sources for libgstomx.so which is shipped with L4T R32.1?
Do you think that SetMaxEncClock can solve the performance issue?
Is there another easier way to change the clock of NVENC manually? I noticed that the sysfs nodes /sys/kernel/debug/clk/nvenc/clk_rate is read-only, so it cannot be used to change the clock.

Thanks!

Hi,
The patch shall help the case. Source code package is in
https://developer.nvidia.com/embedded/dlc/l4t-sources-32-1-JAX-TX2

Thank you for the link, however probably you did not read my last edit, because this link was already part of my latest post.
As I wrote in my previous post this source package does not compile (it fails with “gstomxh265enc.c:725:5: error: ISO C90 forbids mixed declarations and code”). I can try to manually rebase on top of 32.1 the changes provided in the forum to make version 24.2.1 compile, but I thought that maybe you already have a buildable/compileable source package for libgstomx 32.1 (especially since this is a requirement of the LGPL license)?

Following the instructions in the README.txt, the libgstomx.so cannot be compiled (exactly as was the case with version 24.2.1 as discussed in the forum link above):

~/gstomx1_src-32.2/gst-omx1$ make
make  all-recursive
make[1]: Entering directory '/home/taurob/gstomx1_src-32.2/gst-omx1'
Making all in common
make[2]: Entering directory '/home/taurob/gstomx1_src-32.2/gst-omx1/common'
Making all in m4
make[3]: Entering directory '/home/taurob/gstomx1_src-32.2/gst-omx1/common/m4'
make[3]: Nothing to be done for 'all'.
make[3]: Leaving directory '/home/taurob/gstomx1_src-32.2/gst-omx1/common/m4'
make[3]: Entering directory '/home/taurob/gstomx1_src-32.2/gst-omx1/common'
make[3]: Nothing to be done for 'all-am'.
make[3]: Leaving directory '/home/taurob/gstomx1_src-32.2/gst-omx1/common'
make[2]: Leaving directory '/home/taurob/gstomx1_src-32.2/gst-omx1/common'
Making all in omx
make[2]: Entering directory '/home/taurob/gstomx1_src-32.2/gst-omx1/omx'
  CC       libgstomx_la-gstomx_config.lo
  CC       libgstomx_la-gstomx.lo
  CC       libgstomx_la-gstomxvideodec.lo
  CC       libgstomx_la-gstomxvideoenc.lo
  CC       libgstomx_la-gstomxaudioenc.lo
  CC       libgstomx_la-gstomxmjpegdec.lo
  CC       libgstomx_la-gstomxmpeg4videodec.lo
  CC       libgstomx_la-gstomxmpeg2videodec.lo
  CC       libgstomx_la-gstomxh265dec.lo
  CC       libgstomx_la-gstomxh264dec.lo
  CC       libgstomx_la-gstomxh263dec.lo
  CC       libgstomx_la-gstomxwmvdec.lo
  CC       libgstomx_la-gstomxvp8dec.lo
  CC       libgstomx_la-gstomxmpeg4videoenc.lo
  CC       libgstomx_la-gstomxh265enc.lo
gstomxh265enc.c: In function ‘gst_omx_h265_enc_set_params’:
gstomxh265enc.c:725:5: error: ISO C90 forbids mixed declarations and code [-Werror=declaration-after-statement]
     NVX_VIDEO_PARAM_HEVCTYPE_EXT oH265ExtType;
     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
gstomxh265enc.c:727:5: error: ISO C90 forbids mixed declarations and code [-Werror=declaration-after-statement]
     OMX_INDEXTYPE eExtIndex;
     ^~~~~~~~~~~~~
cc1: all warnings being treated as errors
Makefile:809: recipe for target 'libgstomx_la-gstomxh265enc.lo' failed
make[2]: *** [libgstomx_la-gstomxh265enc.lo] Error 1
make[2]: Leaving directory '/home/taurob/gstomx1_src-32.2/gst-omx1/omx'
Makefile:526: recipe for target 'all-recursive' failed
make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory '/home/taurob/gstomx1_src-32.2/gst-omx1'
Makefile:458: recipe for target 'all' failed
make: *** [all] Error 2

Note: even using the patches provided for the version 24.1, the only way I am able to compile libgstomx version 24.1 is by NOT installing libegl on the target, so that the “configure” command outputs “checking for GST_EGL… no”. Otherwise, the file gstomxvideodec.c fails to compile, because even though the patch adds “#undef HAVE_GST_EGL” at the beginning of the file so that egl.h does not get included, a few lines later it includes “gstomxvideodec.h” which includes config.h, and therefore defines HAVE_GST_EGL to true… and therefore it uses macros coming from egl.h without including egl.h and fails to compile all the lines of code using “GST_EGL_IMAGE_MEMORY_TYPE”.
Is libgstomx really intended to be used without libegl? The README file instructions say to compile and install libegl on the system.

I finally managed to build libgstomx.so by using the version in meta-tegra, the way they fixed the compilation errors is by changing the version number to “1.0.0” (this removes the flag -Werror as a side-effect). See this patch: https://github.com/madisongh/meta-tegra/blob/master/recipes-multimedia/gstreamer/gstreamer1.0-omx-tegra/0001-use_lt_sysroot_when_parsing_gstconfig_header.patch

In my opinion, having a three digits version number should be part of the sources distributed in the download center, otherwise the sources cannot compile. I didn’t yet test the patch which should set NVENC clock to the maximum.

Hi,
Please run ‘sudo tegrastats’ to check and confirm NVENC clock is not varying and keeps at maximum.

Hi DaneLLL,
today I rebased the patch on version 32.1 and exported it using quilt so that it works with meta-tegra (patch below for reference in case someone needs it).

tegrastats show that NVENC now has a constant clock of 1.113GHz:

RAM 1722/7861MB (lfb 1110x4MB) CPU [71%@2031,75%@2034,76%@2036,68%@2027,67%@2030,65%@2030] EMC_FREQ 9%@1866 GR3D_FREQ 0%@1300 NVENC 1113 APE 150 MTS fg 0% bg 0% PLL@51.5C MCPU@51.5C PMIC@100C Tboard@45C GPU@48C BCPU@51.5C thermal@50.3C Tdiode@47.25C VDD_SYS_GPU 143/143 VDD_SYS_SOC 1864/1767 VDD_4V0_WIFI 0/10 VDD_IN 9544/8942 VDD_SYS_CPU 4637/4196 VDD_SYS_DDR 1697/1635

And the frame-rate of the 3 camera streams stays constant at 27FPS.

Is there a bad side-effect to expect from this patch? Is it safe to run NVENC clock at maximum frequency? Why is the dynamic frequency scaling reducing the clock even though NVENC is under load?

Index: gst-omx1/omx/gstomxh265enc.c
===================================================================
--- gst-omx1.orig/omx/gstomxh265enc.c
+++ gst-omx1/omx/gstomxh265enc.c
@@ -700,6 +700,8 @@ gst_omx_h265_enc_set_params (GstOMXVideo
           oEncodeProp.codecParams.hevc.nSliceHeaderSpacing = self->slice_header_spacing;
           oEncodeProp.bInsertAUD = self->insert_aud;
           oEncodeProp.bInsertVUI = self->insert_vui;
+          oEncodeProp.bSetMaxEncClock = TRUE;
+          GST_WARNING("set NVENC clock to maximum");
 
           eError =
               gst_omx_component_set_parameter (GST_OMX_VIDEO_ENC (self)->enc,