High decoding latency for stream produced by nvv4l2h264enc compared to omxh264enc

Hello,
While a h264 stream created by the omxh264enc can be decoded with low latency, the stream produced by the nvv4l2h264enc cannot be decoded with low latency.

By analyzing the SPS NALUs for both encoders I found out that the issue is the pic_order_cnt_type.

With default settings the omxh264enc uses pic_order_cnt_type=2 which disables re-ordering of images and allows the decoder to work in low latency.

In contrast, the nvv4l2h264enc uses pic_order_cnt_type=0 which forces the decoder to hold onto decoded frames unneccesarily.

Is there an option to use pic_order_cnt_type=2 with the nvv4l2h264enc ?

Since the omxh264enc is deprecated, it would be great to have the same low-latency functionality from the nvv4l2h264enc.

You may try to set vbv-size parameter of nvv4l2h264enc to a lower value (first try value about 40-50).
For more details, see:

gst-inspect-1.0 nvv4l2h264enc

vbv-size does not affect the ‘pic_order_cnt_type’. For clarification, the issue is that the nvv4l2h264enc produces a h264 stream that is impossible to decode without buffering frames (introducing latency), whereas the stream produced by omxh264enc is.

The documentation for nvidia tegra
https://docs.nvidia.com/jetson/l4t/index.html#page/Tegra%20Linux%20Driver%20Package%20Development%20Guide/accelerated_gstreamer.html
Mentions a parameter called poc-type (pic order count type)
This is exactly the parameter I was looking for ! However, on the jetson nano, running gst-inspect shows me that this parameter is not implemented.

Can you please add this parameter to the jetson nano ?
Since omxh264enc uses pic order count type=2 the hardware would be definitely capable of doing so.

Hi,
Does it help if you set this property:

  maxperf-enable      : Enable or Disable Max Performance mode
                        flags: readable, writable, changeable only in NULL or READY state
                        Boolean. Default: false

Hello,
Using maxperf-enable doesn’t change any parameters in the pipeline as far as I know. It only increases the clock speed of the encoder, which doesn’t help since the problem is at the non-nvidia decoder. The latency comes from the decoder, and literally all smartphone manufacturers buffer frames when pic_order_cnt=0. The issue is that the nvv4l2h264enc produces a h264 stream that cannot be decoded with low latency according to the h264 spec.

Note also that it is slightly contradictary that pic_order_cnt=2 disables re-ordering of frames, but that’s the way it is written in the h264 specs.
For encoding a live video stream picture re-ordering is impossible, but the vv4l2h264enc creates a stream where - according to the specs - picture re-ordering is enabled.

Seems like a solution is going to be possible in the next fw update.