Jetson 4k Encoding -> Decoding Pipeline and latency

Dear Community,

I am trying to setup a low latency 4k encoding -> decoding pipeline.
The system looks like that:
4k camera -> HDMI -> USB -> Jetson Nano -> UDP -> Jetson Nano -> HDMI 4k TFT

On the Encoder side I started with first tests and got local 4k playback running smooth but with a significant latecny of about 500ms.
The used pipeline is:

gst-launch-1.0 -v v4l2src ! video/x-raw,format=NV12 ! nvvidconf ! autovideosink

My questions here are:

  • what is the most preferred way to display with minimum latency?
  • is there a way to reduce buffers to reduce latency?
  • what is the preferred way to measure latency for each and every gstreamer element?

For the final streaming solution via ethernet (WiFi) the below pipelines do work basically but needs further optimization:

Encoder:

gst-launch-1.0 -v v4l2src ! video/x-raw,format=NV12 ! nvvidconv ! nvv4l2h264enc insert-sps-pps=1 maxperf-enable=1 ! h264parse ! rtph264pay pt=96 mtu=1500 ! udpsink host=192.168.2.65 port=5000 sync=false async=false

Decoder:
`

gst-launch-1.0 udpsrc port=5000 ! application/x-rtp,encoding-name=H264,payload=96 ! rtph264depay ! h264parse ! queue ! omxh264dec ! autovideosink

Looking forward to your inputs and suggestions where to start with optimizing and how to setup gst pipeline profiling.

Many thanks for your help,
Dieter

I was talking to a NVidia Engineer, and he suggested to use the accelerated plugin.
With below pipeline the latency seem to be lower but it do not support the needed NV12 format for some reason:

gst-launch-1.0 nvv4l2camerasrc num-buffers=300 device=/dev/video0 ! ‘video/x-raw(memory:NVMM), format=(string)UYVY, width=(int)3840, height=(int)2160, interlace-mode=progressive, framerate=(fraction)30/1’ ! nvvidconv ! ‘video/x-raw(memory:NVMM), format=(string)NV12’ ! nv3dsink -e

Once I change the format string to:
format=(string)NV12 the pipeline is broken and I do get below error:

dieter@dieter-desktop:~$ gst-launch-1.0 nvv4l2camerasrc num-buffers=300 device=/dev/video0 ! 'video/x-raw(memory:NVMM), format=(string)NV12, width=(int)3840, height=(int)2160, interlace-mode=progressive, framerate=(fraction)30/1' ! nvvidconv ! 'video/x-raw(memory:NVMM), format=(string)NV12' ! nv3dsink -e
WARNING: erroneous pipeline: could not link nvv4l2camerasrc0 to nvvconv0, nvv4l2camerasrc0 can't handle caps video/x-raw(memory:NVMM), format=(string)NV12, width=(int)3840, height=(int)2160, interlace-mode=(string)progressive, framerate=(fraction)30/1
dieter@dieter-desktop:~$

Hi,
It supports UYVY in nvv4l2camerasrc. If your source does not support the format, you need to use v4l2src. In encoding pipeline, we suggest set I frame interval to smaller value such as 15 or 10. Since decoding begins from I frames, the received bitstream in front of the first I frames cannot be decoded and is dropped. In decoding pipeline, you may try nvv4l2decoder with disable-dpb=1 and enable-max-performance=1.

Hi DaneLL,
are you aware of the exact difference between nvv4l2camerasrc and v4l2src with regards to latency, system load and performance?
That would be good to understand before moving into the next stage.
Many thanks,
Dieter

Hi,
The v4l2src plugin is native gstreamer plugin and it captures frames into CPU buffer. For using hardware engines on Jetson platforms, there is a memory copy through nvvidconv plugin:

v4l2src ! video/x-raw ! nvvidconv ! video/x-raw(memory:NVMM) ! ...

The nvv4l2camerasrc is implemented to eliminate the memory copy and it can run:

nvv4l2camerasrc ! video/x-raw(memory:NVMM) ! ...

By default it supports UYVY, and it is open source. Do for other YUV422 fomats suchas YUVV(YUY2), you can download the source code and do customization. But your source format is NV12(YUV420). Not sure if it works since generally there is pitch alignment for YUV420.

You can download the source of nvv4l2camerasrc from:
https://developer.nvidia.com/embedded/L4T/r32_Release_v4.4/r32_Release_v4.4-GMC3/Sources/T210/public_sources.tbz2

Hi DaneLL,

thanks - that helps to understand and I will work on that.
Next thing I need to understand is the general limitations of the encoder and decoder.
When displaying the picture locally from the camera to the TFT, even with little latency, but this works good enough for now.
If I try to encode the stream and send it over udp to another jetson I got only a few frames / sec but by far not 30fps with 4k. Are there any pipelines I can use to test the encoder -> streaming -> decoder and evaluate the Nano performance? Ideally based on videotestsrc to eliminate hardware influences.

gst-launch-1.0 videotestsrc ! video/x-raw,width=640,height=480 ! nvvidconv ! ‘video/x-raw(memory:NVMM),width=3840,height=2160,format=NV12,framerate=30/1’ ! nvv4l2h264enc bitrate=800000 insert-sps-pps=1 maxperf-enable=1

gst-launch-1.0 udpsrc port=5000 ! application/x-rtp,encoding-name=H264,payload=96 ! rtph264depay ! h264parse ! queue ! omxh264dec ! autovideosink

My main issue at the moment is:
With the pipeline which runs smoth displaying the camera signal locally I am only able to achieve very low fps and heavy delay via udp (1GBit/s ethernet link). And this is almost the same performance I do see with videotestsrc. Something must happen in the background here.
So anything which helps me to identify the bottleneck is highly welcome!

Many thanks,
Dieter