Non-deterministic Green Frames with nvjpegenc on Jetson

• Hardware Platform Jetson Xavier
• DeepStream Version 6.3
• JetPack Version (valid for Jetson only) L4T 35.5.0
• TensorRT Version: 8.5.2-1+cuda11.4
• Issue Type questions / bugs

Hello, I have a Deepstream-based application which takes videos as an input and saves specific frames given by their PTS from the video as JPEG images.

I have noticed a strange behaviour from time to time when the pipeline is taking extremely long to finish (i.e. 30 seconds, instead of ~1 second), while saving completely green frames to the disk.

I have worked to try and isolate the issue to produce a minimum reproducible example, which I will post bellow:

import logging
import os.path
import shutil
import sys
from typing import Set, List

import gi

gi.require_version('Gst', '1.0')
from gi.repository import Gst

Gst.init(None)

basic_format = '%(asctime)s.%(msecs)03dZ [%(levelname)-7s] [%(filename)s|%(lineno)4d] - %(message)s'
basic_date_fmt = '%Y-%m-%dT%H:%M:%S'

logging.basicConfig(
    format=basic_format,
    datefmt=basic_date_fmt,
    stream=sys.stdout
)

logger = logging.getLogger('')
logger.setLevel(logging.DEBUG)


def deco(f):
    def applicator(*args, **kwargs):
        return f(*args, **kwargs)

    return applicator


@deco
def screenshot_filter_probe(pad: Gst.Pad, info: Gst.PadProbeInfo, ts_req: Set[int]) -> Gst.PadProbeReturn:
    buf: Gst.Buffer = info.get_buffer()

    if buf.pts in ts_req:
        logger.info(f'Found frame at {buf.pts}')
        return Gst.PadProbeReturn.OK

    return Gst.PadProbeReturn.DROP


p_screenshots = 'screenshots'
if os.path.exists(p_screenshots):
    shutil.rmtree(p_screenshots)
os.mkdir(p_screenshots)

p_video: str = 'fragment_1737645044369416246.mkv'
# Timestamps of frames which we want to save to disk
requested_frames = {1737645047135000000, 1737645050035000000, 1737645049035000000, 1737645052002000000,
                    1737645051569000000, 1737645053469000000, 1737645055135000000, 1737645056168000000}

pipeline: Gst.Pipeline = Gst.parse_launch(
    f'filesrc location={p_video} ! parsebin ! nvv4l2decoder name=decoder skip-frames=decode_all ! '
    f'video/x-raw(memory:NVMM) ! '
    f'nvjpegenc ! '
    f'multifilesink location={p_screenshots}/im_%d.jpg sync=0'
)

decoder: Gst.Element = pipeline.get_by_name('decoder')
decoder_src: Gst.Pad = decoder.get_static_pad('src')
decoder_src.add_probe(Gst.PadProbeType.BUFFER, screenshot_filter_probe, set(requested_frames))

pipeline.set_state(Gst.State.PLAYING)
bus = pipeline.get_bus()
message: Gst.Message = bus.timed_pop_filtered(30 * Gst.SECOND, Gst.MessageType.EOS | Gst.MessageType.ERROR)
Gst.debug_bin_to_dot_file(pipeline, Gst.DebugGraphDetails.ALL, 'pipeline')

if message is not None:
    print(message.type)
else:
    print('no message (timeout)')
pipeline.set_state(Gst.State.NULL)

I was able to reproduce this behaviour inside the following docker container on Jetson AGX Xavier (dev-kit):

docker run -it --rm --entrypoint=bash -v (pwd):/x --runtime=nvidia  nvcr.io/nvidia/deepstream:6.3-triton-multiarch

This is the output of the program - the pipeline processes a few requested frames and then times out. However none of the saved frames are correct, containing only a green picture.

root@5adcf3d062e4:/x# python3 jpegenc.py 
Opening in BLOCKING MODE 
NvMMLiteOpen : Block : BlockType = 279 
NvMMLiteBlockCreate : Block : BlockType = 279 
2025-01-24T10:07:47.627Z [INFO   ] [jpegenc.py|  39] - Found frame at 1737645047135000000
NvMMLiteBlockCreate : Block : BlockType = 1 
2025-01-24T10:07:57.802Z [INFO   ] [jpegenc.py|  39] - Found frame at 1737645049035000000
2025-01-24T10:08:07.999Z [INFO   ] [jpegenc.py|  39] - Found frame at 1737645050035000000
no message (timeout)

I have meditated on this topic for some time, but could not dissect the problem in a credible manner.

Some of my personal observations (hypotheses) :

  1. Making trivial changes to the code can make the problem dissapear or re-appear. For example, removing the decorator @deco makes the issue not appear. Commenting the line logger.info(f'Found frame at {buf.pts}') does the same. This leads me to believe, that the behaviour is not deterministic and could be related to some uninitialized memory or synchronization of threads.
  2. I have ran the program with GST_DEBUG=X but did not see any logs leading me to a solution
  3. I have noticed that the program behaves differently in different environments (i.e. host vs docker) this again gives me the idea that the behavior is not entirely deterministic
  4. I tried the same code on dGPU, (changing the format from NV12 to I420 because NV12 is not supported by nvjpegenc on desktops). There was no issue to be observed

Did I miss something crucial during my investigation? Is this a known issue in L4T 35.5? Are there workarounds for deterministic frame extraction on Jetson Architectures?

Thank you for your time and help in solving the issue,
Simon

I included a copy of the incorrect(green) images I was talking about. There is also the graph of the pipeline (but it is the same as in the code provided).
Here is a google drive link to the video I was observing this on: fragment_1737645044369416246.mkv - Google Drive


Could you update your Jetpack version to 6.1 and DeepStream version to 7.1 and try that again? We have fixed many similar issue on the latest version, such like 277689 280216 279778.

Hello,
I have read the posts you have linked, and they do seem similar to what we are dealing with.

However, according to this archive: JetPack Archive | NVIDIA Developer, I understood that Jetpack 6.1 is compatible only with Orin devices. In our case, we need to be able to run the application on both Orin and Xavier platforms.

Is my understanding correct, that Jetpack 6.1 does not run on Xavier architectures?

Hi,
You are right. Xavier series are not supported on Jetpack 6. r35.5 is Jetpack 5.1.3 and you may try the latest 5.1.4(r35.6).