Question: changes in buf.pts when nvv4l2decoder skip-frames=decode_key

• Hardware Platform: GPU
• DeepStream Version: 6.3
• TensorRT Version: 8.5.3
• NVIDIA GPU Driver Version: 550.54.15
• Issue Type: Questions/Bugs

Hello, I have noticed an interesting behaviour of nvv4l2decoder when using different values for skip-frames property. It is related to the pts values of the gstreamer buffers produced by the decoder element.

We are reading a sample video with a gstreamer pipeline while being interested only in the key-frames. The video should have a key-frame every 30 frames (that is ~1.2 seconds), but when reading the video, the pts values of the buffers are very different.

Let me illustrate this with code:

from datetime import datetime, timezone

import gi

gi.require_version('Gst', '1.0')
from gi.repository import Gst


def pts_print_probe(pad: Gst.Pad, info: Gst.PadProbeInfo, udata) -> Gst.PadProbeReturn:
    buf: Gst.Buffer = info.get_buffer()
    dt_buf = datetime.fromtimestamp(buf.pts / Gst.SECOND, tz=timezone.utc)

    print(f'pts: {buf.pts}, {dt_buf}')

    return Gst.PadProbeReturn.OK


def drop_delta_probe(pad: Gst.Pad, info: Gst.PadProbeInfo, udata) -> Gst.PadProbeReturn:
    buf: Gst.Buffer = info.get_buffer()
    if buf.has_flags(Gst.BufferFlags.DELTA_UNIT):
        return Gst.PadProbeReturn.DROP

    return Gst.PadProbeReturn.OK


def main():
    Gst.init(None)

    pipeline: Gst.Pipeline = Gst.parse_launch(
        'filesrc location=fragment_1736513537115810865.mkv ! '
        'parsebin  name=parse ! '
        'nvv4l2decoder name=decode skip-frames=decode_key ! '
        'nvvideoconvert name=cvt ! '
        'nveglglessink sync=1'
    )

    pipeline.get_by_name('cvt').get_static_pad('sink').add_probe(Gst.PadProbeType.BUFFER, pts_print_probe, None)
    # pipeline.get_by_name('decode').get_static_pad('sink').add_probe(Gst.PadProbeType.BUFFER, drop_delta_probe, None)

    pipeline.set_state(Gst.State.PLAYING)

    bus = pipeline.get_bus()
    while True:
        message = bus.timed_pop_filtered(Gst.CLOCK_TIME_NONE, Gst.MessageType.EOS | Gst.MessageType.ERROR)
        if message:
            if message.type == Gst.MessageType.ERROR:
                print('Error:', message.parse_error())
            break

    pipeline.set_state(Gst.State.NULL)


if __name__ == "__main__":
    main()

Those are the PTS values:

pts: 1736513537116000000, 2025-01-10 12:52:17.116000+00:00
pts: 1736513546108000000, 2025-01-10 12:52:26.108000+00:00 << Huge gap here
pts: 1736513547066000000, 2025-01-10 12:52:27.066000+00:00
pts: 1736513548265000000, 2025-01-10 12:52:28.265000+00:00
pts: 1736513549386000000, 2025-01-10 12:52:29.386000+00:00
pts: 1736513550666000000, 2025-01-10 12:52:30.666000+00:00 << Almost without time delta here
pts: 1736513550706000000, 2025-01-10 12:52:30.706000+00:00
pts: 1736513550746000000, 2025-01-10 12:52:30.746000+00:00
pts: 1736513550786000000, 2025-01-10 12:52:30.786000+00:00
pts: 1736513550826000000, 2025-01-10 12:52:30.826000+00:00
pts: 1736513550866000000, 2025-01-10 12:52:30.866000+00:00
pts: 1736513550906000000, 2025-01-10 12:52:30.906000+00:00

However when we filter out non-key frames before decoder, the data look as I would expect. We can replicate by a tiny change in the script.

# Use decode_all here
'nvv4l2decoder name=decode skip-frames=decode_all ! '
...
# Uncomment this line
pipeline.get_by_name('decode').get_static_pad('sink').add_probe(Gst.PadProbeType.BUFFER, drop_delta_probe, None)

The output now is correct (i have also verified the values using a 3rd-party library: pyav):

pts: 1736513537116000000, 2025-01-10 12:52:17.116000+00:00
pts: 1736513538315000000, 2025-01-10 12:52:18.315000+00:00
pts: 1736513539515000000, 2025-01-10 12:52:19.515000+00:00
pts: 1736513540715000000, 2025-01-10 12:52:20.715000+00:00
pts: 1736513541915000000, 2025-01-10 12:52:21.915000+00:00
pts: 1736513543113000000, 2025-01-10 12:52:23.113000+00:00
pts: 1736513543753000000, 2025-01-10 12:52:23.753000+00:00
pts: 1736513544951000000, 2025-01-10 12:52:24.951000+00:00
pts: 1736513546148000000, 2025-01-10 12:52:26.148000+00:00
pts: 1736513547346000000, 2025-01-10 12:52:27.346000+00:00
pts: 1736513548546000000, 2025-01-10 12:52:28.546000+00:00
pts: 1736513549746000000, 2025-01-10 12:52:29.746000+00:00

How is this possible? Is this functionality correct and expected, or is it possible a bug in the timestamps of the frames?

Here is the video file used in the example: fragment_1736513537115810865.mp4 - Google Drive

Thank you for your time reading my question.

It may be a bug about the timestamps of the frames. We’ll investigate this ASAP.