Jetson 4k Encoding -> Decoding Pipeline and latency

dieter.kiermaier · January 14, 2021, 2:55pm

Dear Community,

I am trying to setup a low latency 4k encoding → decoding pipeline.
The system looks like that:
4k camera → HDMI → USB → Jetson Nano → UDP → Jetson Nano → HDMI 4k TFT

On the Encoder side I started with first tests and got local 4k playback running smooth but with a significant latecny of about 500ms.
The used pipeline is:

gst-launch-1.0 -v v4l2src ! video/x-raw,format=NV12 ! nvvidconf ! autovideosink

My questions here are:

what is the most preferred way to display with minimum latency?
is there a way to reduce buffers to reduce latency?
what is the preferred way to measure latency for each and every gstreamer element?

For the final streaming solution via ethernet (WiFi) the below pipelines do work basically but needs further optimization:

Encoder:

gst-launch-1.0 -v v4l2src ! video/x-raw,format=NV12 ! nvvidconv ! nvv4l2h264enc insert-sps-pps=1 maxperf-enable=1 ! h264parse ! rtph264pay pt=96 mtu=1500 ! udpsink host=192.168.2.65 port=5000 sync=false async=false

Decoder:
`

gst-launch-1.0 udpsrc port=5000 ! application/x-rtp,encoding-name=H264,payload=96 ! rtph264depay ! h264parse ! queue ! omxh264dec ! autovideosink

Looking forward to your inputs and suggestions where to start with optimizing and how to setup gst pipeline profiling.

Many thanks for your help,
Dieter

dieter.kiermaier · January 14, 2021, 5:12pm

I was talking to a NVidia Engineer, and he suggested to use the accelerated plugin.
With below pipeline the latency seem to be lower but it do not support the needed NV12 format for some reason:

gst-launch-1.0 nvv4l2camerasrc num-buffers=300 device=/dev/video0 ! ‘video/x-raw(memory:NVMM), format=(string)UYVY, width=(int)3840, height=(int)2160, interlace-mode=progressive, framerate=(fraction)30/1’ ! nvvidconv ! ‘video/x-raw(memory:NVMM), format=(string)NV12’ ! nv3dsink -e

Once I change the format string to:
format=(string)NV12 the pipeline is broken and I do get below error:

dieter@dieter-desktop:~$ gst-launch-1.0 nvv4l2camerasrc num-buffers=300 device=/dev/video0 ! 'video/x-raw(memory:NVMM), format=(string)NV12, width=(int)3840, height=(int)2160, interlace-mode=progressive, framerate=(fraction)30/1' ! nvvidconv ! 'video/x-raw(memory:NVMM), format=(string)NV12' ! nv3dsink -e
WARNING: erroneous pipeline: could not link nvv4l2camerasrc0 to nvvconv0, nvv4l2camerasrc0 can't handle caps video/x-raw(memory:NVMM), format=(string)NV12, width=(int)3840, height=(int)2160, interlace-mode=(string)progressive, framerate=(fraction)30/1
dieter@dieter-desktop:~$

DaneLLL · January 14, 2021, 11:31pm

Hi,
It supports UYVY in nvv4l2camerasrc. If your source does not support the format, you need to use v4l2src. In encoding pipeline, we suggest set I frame interval to smaller value such as 15 or 10. Since decoding begins from I frames, the received bitstream in front of the first I frames cannot be decoded and is dropped. In decoding pipeline, you may try nvv4l2decoder with disable-dpb=1 and enable-max-performance=1.

dieter.kiermaier · January 15, 2021, 8:06am

Hi DaneLL,
are you aware of the exact difference between nvv4l2camerasrc and v4l2src with regards to latency, system load and performance?
That would be good to understand before moving into the next stage.
Many thanks,
Dieter

DaneLLL · January 15, 2021, 9:50am

Hi,
The v4l2src plugin is native gstreamer plugin and it captures frames into CPU buffer. For using hardware engines on Jetson platforms, there is a memory copy through nvvidconv plugin:

v4l2src ! video/x-raw ! nvvidconv ! video/x-raw(memory:NVMM) ! ...

The nvv4l2camerasrc is implemented to eliminate the memory copy and it can run:

nvv4l2camerasrc ! video/x-raw(memory:NVMM) ! ...

By default it supports UYVY, and it is open source. Do for other YUV422 fomats suchas YUVV(YUY2), you can download the source code and do customization. But your source format is NV12(YUV420). Not sure if it works since generally there is pitch alignment for YUV420.

You can download the source of nvv4l2camerasrc from:
https://developer.nvidia.com/embedded/L4T/r32_Release_v4.4/r32_Release_v4.4-GMC3/Sources/T210/public_sources.tbz2

dieter.kiermaier · January 15, 2021, 1:17pm

Hi DaneLL,

thanks - that helps to understand and I will work on that.
Next thing I need to understand is the general limitations of the encoder and decoder.
When displaying the picture locally from the camera to the TFT, even with little latency, but this works good enough for now.
If I try to encode the stream and send it over udp to another jetson I got only a few frames / sec but by far not 30fps with 4k. Are there any pipelines I can use to test the encoder → streaming → decoder and evaluate the Nano performance? Ideally based on videotestsrc to eliminate hardware influences.

gst-launch-1.0 videotestsrc ! video/x-raw,width=640,height=480 ! nvvidconv ! ‘video/x-raw(memory:NVMM),width=3840,height=2160,format=NV12,framerate=30/1’ ! nvv4l2h264enc bitrate=800000 insert-sps-pps=1 maxperf-enable=1

gst-launch-1.0 udpsrc port=5000 ! application/x-rtp,encoding-name=H264,payload=96 ! rtph264depay ! h264parse ! queue ! omxh264dec ! autovideosink

My main issue at the moment is:
With the pipeline which runs smoth displaying the camera signal locally I am only able to achieve very low fps and heavy delay via udp (1GBit/s ethernet link). And this is almost the same performance I do see with videotestsrc. Something must happen in the background here.
So anything which helps me to identify the bottleneck is highly welcome!

Many thanks,
Dieter

DaneLLL · January 17, 2021, 11:12pm

Hi,
There is property of enabling maximum performance in nvv4l2h264enc and nvv4l2decoder plugins. Please enable it and try again. And it is uncertain what plugin is picked by autovideosink. You may try nvoverlaysink sync=0.

DaneLLL · January 20, 2021, 7:12am

Hi,
Please check this example:

Some users use RTSP. You may also check the steps in
Jetson Nano FAQ
Q: Is there any example of running RTSP streaming?

dieter.kiermaier · January 20, 2021, 8:43am

Hi DaneLLL,
thank you for your feedback.
I tried with your suggested pipelines. First I had to change the resolution to 4k because my industrial grade 4k USB camera is only providing 4k as an output.
I am in highest power mode and did restart after selecting this (just for completeness).

The pipelines do start both but somehow nothing is visible:

Sender / Encoder:

dieter@dieter-nano:~$ sudo jetson_clocks
[sudo] password for dieter:
dieter@dieter-nano:~$ gst-launch-1.0 v4l2src device=/dev/video0 ! ‘video/x-raw, format=(string)NV12, width=(int)3840, height=(int)2160’ ! nvvidconv ! ‘video/x-raw(memory:NVMM), format=(string)I420’ ! nvv4l2h264enc maxperf-enable=1 insert-sps-pps=1 ! h264parse ! rtph264pay pt=96 ! udpsink host=192.168.2.65 port=5000 sync=false
Setting pipeline to PAUSED …
Opening in BLOCKING MODE
Pipeline is live and does not need PREROLL …
Setting pipeline to PLAYING …
New clock: GstSystemClock
Redistribute latency…
NvMMLiteOpen : Block : BlockType = 4
===== NVMEDIA: NVENC =====
NvMMLiteBlockCreate : Block : BlockType = 4
H264: Profile = 66, Level = 0

^Chandling interrupt.
Interrupt: Stopping pipeline …
Execution ended after 0:00:07.819961388
Setting pipeline to PAUSED …
Setting pipeline to READY …
Setting pipeline to NULL …
^C
dieter@dieter-nano:~$

Receiver / Decoder:

dieter@dieter-desktop:~$ sudo jetson_clocks
dieter@dieter-desktop:~$ gst-launch-1.0 udpsrc port=5000 ! ‘application/x-rtp,encoding-name=H264,payload=96’ ! rtph264depay ! h264parse ! nvv4l2decoder enable-max-performance=1 ! nvoverlaysink sync=false
Setting pipeline to PAUSED …
Opening in BLOCKING MODE
Pipeline is live and does not need PREROLL …
Setting pipeline to PLAYING …
New clock: GstSystemClock
NvMMLiteOpen : Block : BlockType = 261
NVMEDIA: Reading vendor.tegra.display-size : status: 6
NvMMLiteBlockCreate : Block : BlockType = 261

(gst-launch-1.0:7479): GStreamer-CRITICAL **: 09:34:01.315: gst_mini_object_unref: assertion ‘mini_object != NULL’ failed

^Chandling interrupt.
Interrupt: Stopping pipeline …
Execution ended after 0:00:08.412688281
Setting pipeline to PAUSED …
Setting pipeline to READY …
^C
dieter@dieter-desktop:~$

Again, if you do have a working 4k pipeline which is known to work, this would be highly welcome!

Many thanks in advance for your effforts,
Dieter

DaneLLL · January 21, 2021, 3:14am

Hi,
We can run 4K UDP streaming on two Jetson Nanos. One Nano runs two cases:

videotestsrc

gst-launch-1.0 videotestsrc is-live=1 ! 'video/x-raw,format=NV12,width=640,height=480,framerate=30/1' ! nvvidconv ! 'video/x-raw(memory:NVMM),format=NV12,width=3840,height=2160' ! nvv4l2h264enc maxperf-enable=1 insert-sps-pps=1 ! h264parse ! rtph264pay ! udpsink host=10.19.107.69 port=5000 sync=false

v4l2src(E-Con CU135)

$ gst-launch-1.0 v4l2src device=/dev/video2 ! 'video/x-raw,format=UYVY,width=3840,height=2160,framerate=15/1' ! nvvidconv ! 'video/x-raw(memory:NVMM),format=NV12' ! nvv4l2h264enc maxperf-enable=1 insert-sps-pps=1 ! h264parse ! rtph264pay ! udpsink host=10.19.107.69 port=5000 sync=false

And on the other Nano(IP 10.19.107.69), we can see video playback in running

$ gst-launch-1.0 udpsrc port=5000 ! 'application/x-rtp,encoding-name=H264,payload=96' ! rtph264depay ! h264parse ! nvv4l2decoder enable-max-performance=1 ! nvvidconv ! 'video/x-raw(memory:NVMM),width=1920,height=1080' ! fpsdisplaysink text-overlay=0 video-sink=nvoverlaysink sync=false -v

The TV in the setup does not support 4K resolution, so we downscale it to 1080p. If your TV supports 4K, you may remove the nvvidconv plugin.

And can see expected framerate.

The capability of the camera is

nvidia@nvidia-desktop:~$ v4l2-ctl -d /dev/video2 --list-formats-ext
ioctl: VIDIOC_ENUM_FMT
        Index       : 0
        Type        : Video Capture
        Pixel Format: 'UYVY'
        Name        : UYVY 4:2:2
                Size: Discrete 1280x720
                        Interval: Discrete 0.017s (60.000 fps)
                        Interval: Discrete 0.033s (30.000 fps)
                Size: Discrete 1920x1080
                        Interval: Discrete 0.017s (60.000 fps)
                        Interval: Discrete 0.033s (30.000 fps)
                Size: Discrete 3840x2160
                        Interval: Discrete 0.067s (15.000 fps)
                        Interval: Discrete 0.133s (7.500 fps)
                 (...skip...)

So we only can verify 4Kp15 with the USBcamera. But with videotestsrc, it can achieves 4Kp30.

dieter.kiermaier · January 21, 2021, 7:55am

Hi DaneLLL,

thanks much - mabe the enable-max-performance property on nvv4l2decoder element was my missing piece. I will definitely test ASAP and let you know.
In general I got 2 pipelines working right now as well:
Encoder:

h.264:

gst-launch-1.0 -v v4l2src ! video/x-raw,format=NV12 ! nvvidconv ! nvv4l2h264enc insert-sps-pps=1 maxperf-enable=1 ! h264parse ! rtph264pay pt=96 mtu=1500 ! udpsink host=192.168.2.65 port=5000 sync=false async=false

h.265:

gst-launch-1.0 -v v4l2src ! video/x-raw,format=NV12 ! nvvidconv ! nvv4l2h265enc insert-sps-pps=1 ! h265parse ! rtph265pay pt=96 mtu=1500 ! udpsink host=192.168.2.65 port=5000 sync=false async=false

Decoder:

h.264:

gst-launch-1.0 udpsrc port=5000 ! ‘application/x-rtp,enconding-name=H264,payload=96’ ! rpth264depay ! h264parse ! omxh264dec max-performance=1 ! nvoverlaysink sync=false

h.265:

gst-launch-1.0 udpsrc port=5000 ! ‘application/x-rtp,enconding-name=H265,payload=96’ ! rpth265depay ! h265parse ! nvv4l2decoder ! nvoverlaysink sync=false

One issue I am still facing is that I do have heavy compression artefacts on the decoded image. Is there any document / guideline how to tune the encoder to get best image quality vs latency vs bandwith? This happens especially when encoding still images with lots of details and no image change over some seconds. Then the picture starts to blurr.

Many thanks,
Dieter

dieter.kiermaier · January 21, 2021, 11:30am

Hi DaneLLL,

we are getting closer.

The encoder pipeline with nvv4l2decoder is not working with 4k resolution when I use the nvv4l2decoder but it works with the omxh264dec

The output in case of the error is:

dieter@dieter-desktop:~$ gst-launch-1.0 udpsrc port=5000 ! application/x-rtp,encoding-name=H264,payload=96 ! rtph264depay ! h264parse ! nvv4l2decoder ! fpsdisplaysink text-overlay=0 video-sink=nvoverlaysink sync=false -v
Setting pipeline to PAUSED …
Opening in BLOCKING MODE
Pipeline is live and does not need PREROLL …
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0/GstNvOverlaySink-nvoverlaysink:nvoverlaysink-nvoverlaysink0: sync = false
Setting pipeline to PLAYING …
New clock: GstSystemClock
/GstPipeline:pipeline0/GstCapsFilter:capsfilter0.GstPad:src: caps = application/x-rtp, encoding-name=(string)H264, payload=(int)96, media=(string)video, clock-rate=(int)90000
/GstPipeline:pipeline0/GstRtpH264Depay:rtph264depay0.GstPad:src: caps = video/x-h264, stream-format=(string)byte-stream, alignment=(string)au
/GstPipeline:pipeline0/GstH264Parse:h264parse0.GstPad:src: caps = video/x-h264, stream-format=(string)byte-stream, alignment=(string)au, parsed=(boolean)true
NvMMLiteOpen : Block : BlockType = 261
NVMEDIA: Reading vendor.tegra.display-size : status: 6
NvMMLiteBlockCreate : Block : BlockType = 261
/GstPipeline:pipeline0/nvv4l2decoder:nvv4l2decoder0.GstPad:sink: caps = video/x-h264, stream-format=(string)byte-stream, alignment=(string)au, parsed=(boolean)true
/GstPipeline:pipeline0/GstH264Parse:h264parse0.GstPad:sink: caps = video/x-h264, stream-format=(string)byte-stream, alignment=(string)au
/GstPipeline:pipeline0/GstRtpH264Depay:rtph264depay0.GstPad:sink: caps = application/x-rtp, encoding-name=(string)H264, payload=(int)96, media=(string)video, clock-rate=(int)90000
/GstPipeline:pipeline0/GstH264Parse:h264parse0.GstPad:src: caps = video/x-h264, stream-format=(string)byte-stream, alignment=(string)au, width=(int)3840, height=(int)2160, framerate=(fraction)0/1, interlace-mode=(string)progressive, chroma-format=(string)4:2:0, bit-depth-luma=(uint)8, bit-depth-chroma=(uint)8, parsed=(boolean)true, profile=(string)constrained-baseline, level=(string)5.1

(gst-launch-1.0:11824): GStreamer-CRITICAL **: 12:15:13.283: gst_mini_object_unref: assertion ‘mini_object != NULL’ failed

After the GStreamer-CRITICAL nothing happens

Whith the omxh264 encoder the pipeline works.

But the image qualitiy is really poor. I upladed a video to show what I mean. This video is a recording from my monitor where the decoder is running:

External Media

The used pipeline is:

gst-launch-1.0 udpsrc port=5000 ! application/x-rtp,encoding-name=H264,payload=96 ! rtph264depay ! h264parse ! queue ! omxh264dec ! fpsdisplaysink text-overlay=0 video-sink=nvoverlaysink sync=false -v

The output you can see here:

h264_Decoder_output.txt (8.5 KB)

DaneLLL · January 21, 2021, 4:14pm

Hi,
The default bitrate setting is 4Mbps, which is too small for 4K resolution. Please try with larger value. And please run in CBR and adjust virtual buffer size to get balance setting in bitrate and quality. May refer to

A bit strange nvv4l2decoder doe not work. Probably you don’t use JP4.4.1? There is a known issue on JP4.4:

If you run JP4.4, please remove h264parse plugin for a try.

dieter.kiermaier · January 21, 2021, 4:39pm

Hi DaneLLL,
I am running JP 4.4.3 - removing of the parse element elemints the crash but still no video visible.

BUT - incrising the bitrate to about 35MBits significantly improve image quality. I will need to do some further research to balance bitrate vs quality and will also compare with h.265 if there is improvement when highering the bitrate.
Will let you know my findings.
Many thanks so far - we did a great step ahead!
Dieter

DaneLLL · January 22, 2021, 1:46am

Hi,
JP4.4.3 looks not right. Please check the version by executing $ head -1 /etc/nv_tegra_release. r32.4.3 is JP4.4 and r32.4.4 is JP4.4.1.

dieter.kiermaier · January 22, 2021, 1:02pm

Hi DaneLLL,

I guess I explained it the wrong way.
This is my version:
cat /etc/nv_tegra_release

R32 (release), REVISION: 4.3, GCID: 21589087, BOARD: t210ref, EABI: aarch64, DATE: Fri Jun 26 04:38:25 UTC 2020

To finalize:
I am able now to encode and decode and with the correct bitrate (>15MBit for h.265, and > 35MBit for h.264) the image quality is ok. For sure there is need for further tuning, but I am good to go now!
I will respond later this weekend with a short summary how it works, to make it accessible for others as well.

What is still the case - h.264 only works with omxh264dec and not with nvv4l2dec plugin! Is there any restriction know?

My next step is to evaluate with a MIPI-CSI camera to figure out, what additional latency is introduced by USB 3.0.

Many thanks for your help so far!
Dieter

dieter.kiermaier · January 23, 2021, 10:09am

Hi DaneLLL,
as promised here is the fully working pipeline for both encoding and decoding:

Encoder:

$ gst-launch-1.0 -v v4l2src ! video/x-raw,format=NV12 ! nvvidconv ! nvv4l2h265enc bitrate=15000000 control-rate=0 vbv-size=100000 insert-sps-pps=1 maxperf-enable=true control-rate=0 ! h265parse ! rtph265pay pt=96 mtu=1500 ! udpsink host=192.168.2.65 port=5000 sync=false async=false

Decoder:

$ gst-launch-1.0 udpsrc port=5000 ! ‘application/x-rtp,encoding-name=H265,payload=96’ ! rtph265depay ! h265parse ! nvv4l2decoder enable-max-performance=1 ! fpsdisplaysink text-overlay=0 video-sink=nvoverlaysink sync=false -v

One thing I observed but did not dig deeper in currently is a regularly upcoming short “freeze” for only parts of milliseconds. I need to figure out first, if this is coming from the encoder or decoder. As soon as I have tracked this down, I will need to resolve this during next week.

Have a great weekend and stay safe,
Dieter

DaneLLL · January 25, 2021, 2:57am

Hi,
You may get the gst-v4l2 source of r32.4.4 from
https://developer.nvidia.com/embedded/L4T/r32_Release_v4.4/r32_Release_v4.4-GMC3/Sources/T210/public_sources.tbz2

And build/replace libgstnvvideo4linux2.so on r32.4.3.

/usr/lib/aarch64-linux-gnu/gstreamer-1.0/libgstnvvideo4linux2.so

Check if it works in decoding h264 stream through nvv4l2decoder.

dieter.kiermaier · January 25, 2021, 10:17am

Hi DaneLLL,
thanks for that information. Do you see any advantage of using nvv4l2decoder compared to omxh264dec element?

Many thanks,
Dieter

DaneLLL · January 25, 2021, 11:57pm

Hi,
Since we have deprecated omx plugins, would recommend use v4l2 plugins.

Topic		Replies	Views
Jetson Nano 4K decode&encode latency Jetson Nano gstreamer	6	986	October 15, 2021
Stream 4k webcam - GStreamer Jetson Nano	26	9762	October 15, 2021
Video Broadcasting - Latency/Lag reduction Jetson Nano gstreamer	16	2093	October 18, 2021
Issues decoding RTSP stream using nvv4l2decoder with Jetpack 4.4 Jetson Nano rtsp , nvbugs	27	8261	October 18, 2021
The hardware encoding using a jetson NX (nvv4l2h265enc ) GPU-Accelerated Libraries gstreamer	6	1436	August 14, 2020
h.264 video codec of gstreamer Jetson Nano	24	6896	October 14, 2021
GStreamer Output with Hardware Decoding/Encoding Sped Up Jetson Nano gstreamer	3	744	October 18, 2021
Dji h20t jetpack 4.6..5 => 5.1.3 Jetson Xavier NX gstreamer	11	110	June 10, 2025
Jetson nano developer kit nvjpeg encoder's speed so slowly. it encode YUV420_7264X4112_Pic 100 times need 20s Jetson Nano mmapi	23	1938	October 18, 2021
Performance of omxh264dec Jetson Nano gstreamer	6	1180	October 18, 2021

Jetson 4k Encoding -> Decoding Pipeline and latency

R32 (release), REVISION: 4.3, GCID: 21589087, BOARD: t210ref, EABI: aarch64, DATE: Fri Jun 26 04:38:25 UTC 2020

Related topics