DeepStream Detections Out Of Sync With Encoded JPEG Frames For 15+ Video Sources

• Hardware Platform: GPU
• DeepStream Version: 7.1
• TensorRT Version: 10.3.0.26-1+cuda12.5
• NVIDIA GPU Driver Version: 560.35.05-0ubuntu1
• Issue Type: questions
• How to reproduce the issue ? Source code is provided

I’m experiencing severe issues when scaling my DeepStream pipeline from 5 video sources to 15+ sources. The pipeline works flawlessly with fewer sources but has these random behavior with more sources:

  1. Inconsistent Detection Results: When running the same video on all sources, same tracked objects (cars) produce different detection outputs
  2. Metadata Synchronization Issues: Detection bounding boxes doesn’t always align with JPEG frame output

Simplified Pipeline Architecture

Sources → nvstreammux → PGIE (car) → Tracker → SGIE1 (plate) → SGIE2 (OCR) → [sgie2_detection_probe] → Tee → [Branch 1] → nvdslogger → fakesink
                                                                                                        ↓
                                                                                                    [Branch 2] → nvvideoconvert → capsfilter (NVMM, format=I420) → nvjpegenc → [jpeg_encoder_probe] → fakesink

Probe functions:

@staticmethod
def sgie2_detection_probe(
    pad, info: Gst.PadProbeInfo, detection_storage: dict, dict_lock: Lock
):
    """
    Probe to build dictionary out of the detections, and save it to a temporary buffer
    """
    gst_buffer = info.get_buffer()
    if not gst_buffer:
        return Gst.PadProbeReturn.OK

    gst_buffer_hash = hash(gst_buffer)
    batch_meta = pyds.gst_buffer_get_nvds_batch_meta(gst_buffer_hash)
    if not batch_meta:
        return Gst.PadProbeReturn.OK

    l_frame = batch_meta.frame_meta_list
    while l_frame is not None:
        try:
            frame_meta = pyds.NvDsFrameMeta.cast(l_frame.data)
        except StopIteration:
            break

        if frame_meta.num_obj_meta:
            frame_data = DeepStream.build_object_hierarchy(frame_meta)
            key = f"{frame_meta.source_id}_{frame_meta.frame_num}"
            with dict_lock:
                detection_storage[key] = frame_data

        try:
            l_frame = l_frame.next
        except StopIteration:
            break

    return Gst.PadProbeReturn.OK

@staticmethod
def jpeg_encoder_probe(
    pad,
    info: Gst.PadProbeInfo,
    frame_data_queue: multiprocessing.Queue,
    detection_storage: dict,
    dict_lock: Lock,
):
    """
    Probe after JPEG encoder that retrieves detection data and combines with encoded image.
    """
    if detection_storage.values().__len__() > 10:
        print(detection_storage.values().__len__(), end="\r")

    gst_buffer = info.get_buffer()
    if not gst_buffer:
        return Gst.PadProbeReturn.OK

    # Extract JPEG binary data
    success, map_info = gst_buffer.map(Gst.MapFlags.READ)
    if not success:
        return Gst.PadProbeReturn.OK

    jpeg_data = bytes(map_info.data)
    gst_buffer.unmap(map_info)

    # Get batch meta to access frame meta
    gst_buffer_hash = hash(gst_buffer)
    batch_meta = pyds.gst_buffer_get_nvds_batch_meta(gst_buffer_hash)

    if not batch_meta:
        return Gst.PadProbeReturn.OK

    l_frame = batch_meta.frame_meta_list
    while l_frame is not None:
        try:
            frame_meta = pyds.NvDsFrameMeta.cast(l_frame.data)
        except StopIteration:
            break

        try:
            # Get frame_data from storage using unique key
            if frame_meta.num_obj_meta:
                key = f"{frame_meta.source_id}_{frame_meta.frame_num}"
                with dict_lock:
                    frame_data = detection_storage.pop(key, None)
                if frame_data:
                    frame_data["frame_info"]["frame"] = jpeg_data
                    frame_data_queue.put(frame_data)

        except Exception as e:
            print(f"Error processing frame_meta in encoder probe: {e}")

        try:
            l_frame = l_frame.next
        except StopIteration:
            break

    return Gst.PadProbeReturn.OK

Questions

Is there a way to access SGIE1 and SGIE2 detection results directly in the JPEG encoder probe? Currently, the metadata seems stripped or unavailable at that stage. If not, would using NvDsUserMeta or NvDsFrameMeta’s user_meta_list be more reliable for passing detection data through the pipeline?

Any insights on proper metadata synchronization in multi-source DeepStream pipelines would be greatly appreciated. The core question is: How can I reliably associate SGIE detection results with their corresponding encoded frames when scaling to 15+ sources?

Source code: app.py.txt (23.2 KB)

Thank you for your help!

I added nvstreamdemux after tee and now the detected bounding boxes alignment is fixed, however when the number of video sources is increased to about 11, it prints the following logs and gives 0 fps.

Warning: gst-stream-error-quark: Could not demultiplex stream. (9): ../gst/avi/gstavidemux.c(4088): gst_avi_demux_stream_header_pull (): /GstPipeline:pipeline0/GstBin:source-bin-03/GstDsNvUriSrcBin:uri-decode-bin/GstURIDecodeBin:nvurisrc_bin_src_elem/GstDecodeBin:decodebin3/GstAviDemux:avidemux2:
failed to parse stream, ignoring
Warning: gst-stream-error-quark: Could not demultiplex stream. (9): ../gst/avi/gstavidemux.c(4088): gst_avi_demux_stream_header_pull (): /GstPipeline:pipeline0/GstBin:source-bin-04/GstDsNvUriSrcBin:uri-decode-bin/GstURIDecodeBin:nvurisrc_bin_src_elem/GstDecodeBin:decodebin4/GstAviDemux:avidemux1:
failed to parse stream, ignoring
...
(python:465547): GStreamer-Base-CRITICAL **: 15:26:27.555: gst_flow_combiner_update_pad_flow: assertion 'pad != NULL' failed

(python:465547): GStreamer-Base-CRITICAL **: 15:26:27.555: gst_flow_combiner_update_pad_flow: assertion 'pad != NULL' failed

(python:465547): GStreamer-Base-CRITICAL **: 15:26:27.556: gst_flow_combiner_update_pad_flow: assertion 'pad != NULL' failed

(python:465547): GStreamer-Base-CRITICAL **: 15:26:27.557: gst_flow_combiner_update_pad_flow: assertion 'pad != NULL' failed
Failed to query video capabilities: Invalid argument
Failed to query video capabilities: Invalid argument
Failed to query video capabilities: Invalid argument
Failed to query video capabilities: Invalid argument
...
Decodebin child added: nvurisrc_bin_queue 

Decodebin child added: nvurisrc_bin_nvvidconv_elem 

Decodebin child added: nvurisrc_bin_src_cap_filter_nvvidconv 

In cb_newpad

gstname= video/x-raw
features= <Gst.CapsFeatures object at 0x74894b901de0 (GstCapsFeatures at 0x74879002fc20)>
Decodebin child added: nvurisrc_bin_queue 

Decodebin child added: nvurisrc_bin_nvvidconv_elem 

Decodebin child added: nvurisrc_bin_src_cap_filter_nvvidconv 

In cb_newpad

gstname= video/x-raw
features= <Gst.CapsFeatures object at 0x74894b901900 (GstCapsFeatures at 0x7487600042c0)>
Decodebin child added: nvurisrc_bin_queue
...

**PERF : FPS_1 (0.00)   FPS_9 (0.00)    FPS_11 (0.00)
**PERF : FPS_1 (0.00)   FPS_9 (0.00)    FPS_11 (0.00)

Source code: app.py.txt (24.1 KB)

Neither nvmultistreamtiler nor nvstreamdemux is used. The pipeline is wrong.

Please do not use tee before nvstreamdemux.

Please refer to DeepStream SDK FAQ - Intelligent Video Analytics / DeepStream SDK - NVIDIA Developer Forums to set up the correct nvstreammux parameters.

The log shows there is avi media, is the avi media a local media file or a live source? Please check your code for parsing the avi media.

It’s local avi video files. I’m using nvurisrcbin from deepstream-test3. These logs don’t get generated with fewer sources. Can you check out the provided source code please?

Can you please provide the corrected pipeline?

Please provide your configuration and commandline to run deepstream-test3.

For the pipeline with nvstreamdemux, please refer to deepstream-app saample.

I meant I am using the same approach as deepstream-test3 for adding multiple sources(local file or rtsp)

I have actually referred to the deepstream-demux-multi-in-multi-out sample. Unfortunately there’s no sample with tee + nvstreamdemux elements together.

If you could check out the code, do I actually need the tee? Can’t I directly get the encoded frames in a straight-forward pipeline? Essentially the goal is to show the output and get the JPEG encoded frames via probe functions simultaneously.

deepstream-test3 supports multiple input sources, you can use the sample to verify the avi files

These are all supported by deepstream-app, it is open source.

Why do you need the encoded JPEG frames? What will you do with them? Do you need every frame of the video to be encoded into JPEG format?

No, actually I don’t need every single frame. I’m running DeepStream via this module on a separate process and put detections along with encoded frame into a shared multiprocessing.Queue to handle them in my main logic process. I only need the best frame per each tracked object and I decide which frame is the best in my main logic process.

It would actually be a lot more efficient if I could selectively encode the best frames, but I couldn’t figure out a way to communicate the encode signal between these two processes in real-time.

Your main logic process needs every frame and its corresponding detections be available to make the decision that which frames are the best frames, right?

No, it only needs the detections to decide wether the current frame is the best or not.

So your case is similar to Gst-nvstreamdemux — DeepStream documentation and you want to encode some frames after nvstreamdemux, right?

There is already a sample for getting video buffer from NvBufSurface and encode the video buffer into JPEG file. deepstream_python_apps/apps/deepstream-imagedata-multistream/deepstream_imagedata-multistream.py at master · NVIDIA-AI-IOT/deepstream_python_apps

You can refer to the save image part and insert pad probe function with nvstreamdemux output, please make sure you need to add nvvideoconvert to convert the video format to RGBA after nvstreamdemux for JPEG encoding. And you app can design any API to communicate between your main logic process and the probe function.

I have already tried get_nvds_buf_surface to get the raw frame data and convert it to numpy array, but it had a very bad impact on the performance.

I also tried encoding the frame using nvds_obj_enc_process within a probe function and then retrieving the encoded frame further along the pipeline via another probe function. However, due to its asynchronous nature, the second probe function did not always return the encoded frame.

So far nvjpegenc has performed the best for my case, and I haven’t seen any significant performance degradation. I’d appreciate it if you could help me set up the current approach correctly.