Jetson Tx2 Omxh264enc Vs nvv4l2h264enc Comparison

Hello;

With these new nvv4l2 element I made a test. While omxh265dec and nvv4l2decoder performs same, nvv4l2h264enc is %50 slow compared to omxh264enc. What is the reason for that?

Is it because it is new and not yet stable?

Cpu Usage omxh264enc | nvv4l2h264enc
300mhz cpu core| %50 | %50
2ghz cpu core | %60 | %80
Encoding time | 12 seconds | 21 seconds

Other cores are sleeping.

And omxh264enc takes 12 seconds while nvv4l2h264enc takes 21 second.
Is there anyone who takes similar results?

For reference I did decode a 4k video with the same element and encode it back.

Hi,
Please enable the property in nvv4l2h264enc and try again.

maxperf-enable      : Enable or Disable Max Performance mode
                    flags: readable, writable, changeable only in NULL or READY state
                    Boolean. Default: false

Isn’t this feature for nvv4l2decoder? I use the same decoder in both tests but change encoder. So comparison are about encoders.

I couldnt find and max-performance example in encoder in gstreamer accelerated guide.

And I have another question, do you plan to release a guide for r32.2.

Hi,
The property is in nvv4l2h264enc. Please check gstreamer user guide.
https://developer.nvidia.com/embedded/dlc/l4t-accelerated-gstreamer-guide-32-2

Sorry about that, I suppose ctrl+f was not working on pipelines in the document.

Yes I used that feature here are the results.

I still decode a 4k video with nvv4l2decoder then encode them with these elements in my test.

Cpu Usage | omxh264enc | nvv4l2h264enc
300mhz cpu core| %50 | %85
2ghz cpu core | %50 | %85

Encoding time | 12 seconds | 16 seconds

Decoder Frequency | 1200 mhz | 600 mhz
Encoder Frequency | 1200 mhz | 1200 mhz

It seems nvv4l2h264enc is inefficient compared to omxh264enc.

Btw thank you for sharing r32-2 guide. It was not showing on google.

And I have another question, while omxh264enc does not need h264parse, nvv4l2h264enc needs it. Could it be about this?
And why does not omxh264enc need but nvv4l2h264enc needs it?

And lastly Is this performance expected because of new development of the element, it looks like omxh264enc is %200 efficient as nvv4l2h264enc.

Hi,
We have run following pipelines on TX2 with ‘sudo jestson_clocks’ executed:

$ gst-launch-1.0 filesrc location= jellyfish-120-mbps-4k-uhd-h264.mkv ! matroskademux ! h264parse ! nvv4l2decoder ! omxh264enc ! fakesink sync=false
$ gst-launch-1.0 filesrc location= jellyfish-120-mbps-4k-uhd-h264.mkv ! matroskademux ! h264parse ! nvv4l2decoder ! nvv4l2h264enc maxperf-enable=1 ! fakesink sync=false

The video file is http://jell.yfish.us/media/jellyfish-120-mbps-4k-uhd-h264.mkv

For omxh264enc, the execution time(in average) is 9.16 seconds. For nvv4l2h264enc, it is 9.02 seconds. We don’t see the difference in encoding time. Please compare your case to this and check where makes the deviation.

1 Like

I made the comparison at nvpmodel 5 and jetson_clocks closed, since it will be easier to see the difference between the elements.

What about your cpu usage while using those elements?

Does this mean nvv4l2h264enc works better in nvpmodel 0 or jetson_clocks.

Or does it use more cpu to close the gap between omxh264enc?

I shall do the same test then give my feedback, thanks.

Sry for the late response; had problems with flashing.

Decoding and encoding jellyfish video was giving error on me so I used this video, I did decode then encode then write it to a matroska file.

This is the video I used

The test was done on nvpmodel 0, and jetson_clocks open.

But i think it is more meaningfull to do the test at nvpmodel 5 jetson_clocks closed.

Hi,
For performance profiling, we suggest enable jetson_clocks because it runs all CPUs at fix clocks. Without it, dynamic frequency scaling is enabled and you have to consider the varying frequency in calculating the loading.

RAM 1851/7859MB (lfb 991x4MB) SWAP 0/3929MB (cached 0MB) <b>CPU [10%@1114,off,off,9%@1113,12%@1113,10%@1113]</b> EMC_FREQ 8%@665 GR3D_FREQ 0%@114 NVENC 115 APE 150 PLL@38.5C MCPU@38.5C PMIC@100C Tboard@33C GPU@36C BCPU@38.5C thermal@37.7C Tdiode@34.75C VDD_SYS_GPU 153/153 VDD_SYS_SOC 689/689 VDD_4V0_WIFI 0/0 VDD_IN 2988/2994 VDD_SYS_CPU 229/241 VDD_SYS_DDR 558/561
RAM 1851/7859MB (lfb 991x4MB) SWAP 0/3929MB (cached 0MB) <b>CPU [12%@1036,off,off,9%@1113,10%@1113,11%@1113]</b> EMC_FREQ 8%@665 GR3D_FREQ 0%@114 NVENC 115 APE 150 PLL@38.5C MCPU@38.5C PMIC@100C Tboard@33C GPU@36.5C BCPU@38.5C thermal@37.7C Tdiode@34.75C VDD_SYS_GPU 153/153 VDD_SYS_SOC 689/689 VDD_4V0_WIFI 0/0 VDD_IN 2988/2993 VDD_SYS_CPU 229/240 VDD_SYS_DDR 558/560
RAM 1851/7859MB (lfb 991x4MB) SWAP 0/3929MB (cached 0MB) <b>CPU [14%@998,off,off,7%@999,10%@998,10%@998]</b> EMC_FREQ 8%@665 GR3D_FREQ 0%@114 NVENC 115 APE 150 PLL@38.5C MCPU@38.5C PMIC@100C Tboard@33C GPU@36C BCPU@38.5C thermal@37.7C Tdiode@34.75C VDD_SYS_GPU 153/153 VDD_SYS_SOC 689/689 VDD_4V0_WIFI 0/0 VDD_IN 2988/2992 VDD_SYS_CPU 229/238 VDD_SYS_DDR 558/560

With jetson_clcoks enabled, you can simply check loading percentage in tegrastats.

Hi this post seems to be useful for us … just wondering how to measure each plugin’s runtime and CPU usage?

Hi,

You may check the tool from RidgeRun:

FYR.

Hi Thanks for the reply!

Just want to confirm the use of nvv4l2h264enc, I’ve enable jetson_clocks, set nvpmodel to 0, use maximum performance, but using udpsrc + nv3dsink to visualize the stream, it dropped a lot of buffer making the video very choppy. But once replaced it with omxh264enc, it’s working smoother… I’m on Jetpack 4.4. I’ve confirmed nvv4l2decoder never worked for me, but from this post seems there must be a way to get v4l2 enc and dec working?

Hello, I’m not sure what you mean by v4l2 isnt working for you? Because in the same post you say that the video is very choppy with nvv4l2 and smooth with omxh264enc?

to get nvv4l2 to stream for me, I had to enable insert-sps-pps. ie: ! nvv4l2h264enc insert-sps-pps=1 !

As for the choppy video problem… I cant help you with that because I am experiencing the same problem!

I have exactly the same problem. The video is very choppy using nvv4l2 on Jetson Nano, while using x264enc the video is OK. I still didn’t find the root cause:

Choppy video

gst-launch-1.0 videotestsrc is-live=true pattern=snow ! \
                 nvvidconv ! queue  min-threshold-time=100000 ! \
                 nvv4l2h264enc profile=Main MeasureEncoderLatency=true iframeinterval=30 maxperf-enable=true ! \
                 nvv4l2decoder enable-max-performance=true enable-frame-type-reporting=true ! \
                 queue  min-threshold-time=100000 ! \
                 queue ! nvvidconv ! nvoverlaysink sync=0 display_id=0  ts-offset=1000000

OK, good video

gst-launch-1.0 videotestsrc is-live=true pattern=snow ! \
                 nvvidconv ! queue  min-threshold-time=100000 ! \
                 x264enc key-int-max=30 bitrate=10000 byte-stream=false ! \
                 nvv4l2decoder enable-max-performance=true enable-frame-type-reporting=true ! \
                 queue  min-threshold-time=100000 ! \
                 queue ! nvvidconv ! nvoverlaysink sync=0 display_id=0  ts-offset=1000000

Only codec is replaced in the pipeline. Other items in pipeline have no meaning, queues, params can be removed and the result is still the same.
I cannot understand, what’s wrong in nvv4l2h264enc element.

Just found that is-live=true option somehow affects nvv4l2h264enc plugin. Removing is-live=true fixes the video. But it is unclear, why this option is so important only for nvv4l2h264enc.