DeepStream/GStreamer: One RTSP source’s latency drifts up (others stable) and pipeline crashes after ~7 hours. Decoder set to disable-dpb=true

renatomaynardetchepare · November 9, 2025, 9:40pm

**TL;DR

Hardware Platform (Jetson / GPU)**

Jetson AGX Orin Dev Kit

DeepStream Version
7.1

JetPack Version (valid for Jetson only)
6.0 (L4T 36.3.0)

TensorRT Version
10.7.0.23

NVIDIA GPU Driver Version (valid for GPU only)
N/A (Jetson)

Issue Type (questions, new requirements, bugs)
Bug / stability — long-run crash after ~7 hours: nvbufsurface: Failed to create EGLImage (GPU OSD/compositor path)

How to reproduce the issue?
Three RTSP (H.264, 1080p30) → nvstreammux → nvinfer → nvstreamdemux → per-stream nvdsosd(process-mode=0) → nvcompositor → nveglglessink.
Decoder tuned via num-extra-surfaces=2, enable-max-performance=1, low-latency=1, tested disable-dpb=true/false. RTSP tuned with latency=60, drop-on-latency=true, UDP.
I also map the compositor RGBA surface briefly every ~3 s in a pad-probe for a small ArUco check, then unmap immediately. I am using Python**
**

Hey everyone!:) Hope you’re doing well — I’m chasing a long-run DeepStream stability issue and would really appreciate your guidance :(. Jetson app with 3 RTSP H.264 cameras batched through nvstreammux → nvinfer → nvstreamdemux → per-branch nvdsosd → nvcompositor → sink. Two sources remain ~25–35 ms end-to-end, but sid=0 slowly drifts to ~60–70 ms while inference stays ~30 FPS on all. Additionally, after ~7 hours the process crashes (no clean EOS). Looking for guidance on per-source latency drift and the long-run crash. I’m setting nvv4l2decoder.disable-dpb=true.

Symptom & measurements (latency drift on sid=0)

All streams decode/infer ~30 FPS. Display-side latency for sid=0 grows over time; others stay flat:

[PERF 14:23:05] window=60s
  sid=0: disp_fps=0.00  infer_fps=29.25  lat_ms(avg/p50/p95)=29.0/28.4/41.9  n=108
  sid=1: disp_fps=0.00  infer_fps=29.89  lat_ms(avg/p50/p95)=35.5/29.5/51.7  n=124
  sid=2: disp_fps=0.00  infer_fps=30.07  lat_ms(avg/p50/p95)=28.6/27.6/41.6  n=103
...
[PERF 14:31:10] window=60s
  sid=0: disp_fps=0.00  infer_fps=30.51  lat_ms(avg/p50/p95)=87.6/105.3/126.6  n=1437
  sid=1: disp_fps=0.00  infer_fps=30.22  lat_ms(avg/p50/p95)=27.1/26.4/39.0   n=1753
  sid=2: disp_fps=0.00  infer_fps=29.46  lat_ms(avg/p50/p95)=26.5/25.6/39.0   n=1756

New issue: long-run crash after ~7 hours

After running stably for several hours, the process exits/crashes around the 7-hour mark. There’s no deliberate shutdown and no EOS; sometimes no helpful bus error precedes it. I added a tiny uptime log and bus handler—when it happens again I can attach exact stderr/bus messages, but presently it looks like an abrupt abort/segfault rather than a handled error.

Full crash log (as captured)

nvbufsurface: Failed to create EGLImage. nvbufsurface: Failed to create EGLImage. nvbufsurface: Failed to create EGLImage. nvbufsurface: Failed to create EGLImage. 
ERROR GStreamer: gst-resource-error-quark: Unable to draw shapes onto video frame by GPU (1) /dvs/git/dirty/git-master_linux/deepstream/sdk/src/gst-plugins/gst-nvdsosd/gstnvdsosd.c(645): gst_nvds_osd_transform_ip (): /GstPipeline:secview-batched/GstNvDsOsd:osd_2 [Main] Limpiando recursos... libnvosd (1375):(ERROR) : Unable to map EGL Imagenvbufsurface: Failed to create EGLImage. 7:19:23.123390839 31570 0xaaaae7430a40 WARN nvinfer gstnvinfer.cpp:2423:gst_nvinfer_output_loop:<pgie> error: Internal data stream error. 7:19:23.123481178 31570 0xaaaae7430a40 WARN nvinfer gstnvinfer.cpp:2423:gst_nvinfer_output_loop:<pgie> error: streaming stopped, reason error (-5) (FinalVersionBatch3v3.0.py:31570): GStreamer-CRITICAL **: 22:30:51.358: gst_mini_object_unref: assertion '(g_atomic_int_get (&mini_object->lockstate) & LOCK_MASK) < 4' failed libnvosd (1375):(ERROR) : Unable to map EGL Imagenvbufsurface: Failed to create EGLImage. (FinalVersionBatch3v3.0.py:31570): GStreamer-CRITICAL **: 22:30:51.391: gst_mini_object_unref: assertion '(g_atomic_int_get (&mini_object->lockstate) & LOCK_MASK) < 4' failed (FinalVersionBatch3v3.0.py:31570): GStreamer-CRITICAL **: 22:30:51.401: gst_mini_object_unref: assertion '(g_atomic_int_get (&mini_object->lockstate) & LOCK_MASK) < 4' failed nvbufsurface: Failed to create EGLImage. (FinalVersionBatch3v3.0.py:31570): GStreamer-CRITICAL **: 22:30:51.415: gst_mini_object_unref: assertion '(g_atomic_int_get (&mini_object->lockstate) & LOCK_MASK) < 4' failed (FinalVersionBatch3v3.0.py:31570): GStreamer-CRITICAL **: 22:30:51.416: gst_mini_object_unref: assertion '(g_atomic_int_get (&mini_object->lockstate) & LOCK_MASK) < 4' failed [Main] exit complete [UPTIME] 2025-11-08T15:11:28.202001 -> 2025-11-08T22:30:51.836782 uptime=07:19:23 (26363.635s) reason=atexit

Suspicions

Surface mapping inside a pad probe (RGBA readback): I briefly map/unmap surfaces in a compositor-src probe to drive an ArUco-based layout switch. If the nvbufsurface_map/unmap usage is off for a given DeepStream/pyds build, it could be a slow-burn until a rare code path segfaults.
Is mapping via nvbufsurface_map(batch_meta.batch_meta, …) safe here, or should I avoid explicit map/unmap and rely only on pyds.get_nvds_buf_surface() (or move this to a tee→appsink CPU branch)?
nvv4l2decoder.disable-dpb=true with B-frames: If sid=0 camera uses B-frames or longer reference chains, DPB disabling might cause jitter/instability and eventually trigger a downstream edge case.
Small buffer pools under long-run pressure: nvcompositor and decoders have small pools (output-buffers ~4, num-extra-surfaces=2). Rare bursts or resizes could exhaust a pool and trip an internal assert.
Display meta pool usage: I now attach one DisplayMeta per frame and avoid persistent refs; still, if the pool runs dry and add/remove paths disagree, it could crash (though I try to “safe-add” and release on error).

Key code parts

1) RTSP + Decoder low-latency setup (child-added)

def decodebin_child_added(child_proxy, Object, name, user_data):
    if "decodebin" in name:
        Object.connect("child-added", decodebin_child_added, user_data)

    # RTSP: low-latency, UDP, drop late; NTP off (cams unsynced)
    if "rtspsrc" in name or "source" in name:
        Object.set_property("latency", 60)
        Object.set_property("drop-on-latency", True)
        Object.set_property("do-rtsp-keep-alive", True)
        Object.set_property("protocols", 0x1)  # UDP
        Object.set_property("tcp-timeout",  2_000_000_000)
        Object.set_property("timeout",      4_000_000_000)
        try: Object.set_property("ntp-sync", False)
        except Exception: pass

    # Decoder NV: low-latency profile
    if "nvv4l2decoder" in name:
        Object.set_property("num-extra-surfaces", 2)
        Object.set_property("enable-max-performance", True)
        try: Object.set_property("low-latency", True)
        except Exception: pass
        try: Object.set_property("disable-dpb", True)  # <-- questionably safe if B-frames
        except Exception: pass

Full pipeline build (mux → infer → demux → per-branch → compositor)

mux.set_property("batch-size", len(uris))
mux.set_property("width", 1920); mux.set_property("height", 1080)
mux.set_property("live-source", 1)
mux.set_property("sync-inputs", 0)
try: mux.set_property("frame-num-latest", 1)
except Exception: pass
mux.set_property("batched-push-timeout", 5000)  # 5 ms

# after-PGIE queue is leaky to avoid backpressure deadlocks
q_after_pgie.set_property("leaky", 2)
q_after_pgie.set_property("max-size-buffers", 1)

# demux to branches → RGBA (NVMM) → OSD (GPU) → queue → nvcompositor
comp_sinkpad = compositor.request_pad_simple("sink_%u")
q.get_static_pad("src").link(comp_sinkpad)

# instrumentation
src = pgie.get_static_pad("src")
src.add_probe(Gst.PadProbeType.BUFFER, pgie_src_probe, None)          # infer FPS
comp_sinkpad.add_probe(Gst.PadProbeType.BUFFER, fps_probe_for(idx))   # display FPS

Potentially related: reading RGBA for ArUco inside a probe

def probe_rgba_canvas(pad, info, _u):
    CHECK_EVERY_SEC = 3
    buf = info.get_buffer()
    if not buf: return Gst.PadProbeReturn.OK

    # ... omitted: cadence + layout checks ...

    batch_meta = pyds.gst_buffer_get_nvds_batch_meta(hash(buf))
    if not batch_meta: return Gst.PadProbeReturn.OK

    try:
        mapped = False
        try:
            # WARNING: Is this safe for all DS/pyds builds?
            pyds.nvbufsurface_map(batch_meta.batch_meta, frame_meta.batch_id, 0,
                                  pyds.NvBufSurfaceMemType_NVBUF_MEM_DEFAULT)
            mapped = True
        except Exception:
            mapped = False

        rgba_flat = pyds.get_nvds_buf_surface(hash(buf), frame_meta.batch_id)
        if rgba_flat is None or int(getattr(rgba_flat, "size", 0)) == 0:
            if mapped:
                pyds.nvbufsurface_unmap(batch_meta.batch_meta, frame_meta.batch_id, 0)
            return Gst.PadProbeReturn.OK

        # reshape, downscale, run ArUco, then toggle layout accordingly
        # ...
    finally:
        if 'mapped' in locals() and mapped:
            try: pyds.nvbufsurface_unmap(batch_meta.batch_meta, frame_meta.batch_id, 0)
            except Exception: pass

    return Gst.PadProbeReturn.OK

Pipelines

Full (conceptual) multi-source pipeline

(uri0) uridecodebin ! nvv4l2decoder(low-latency=1, disable-dpb=1, num-extra-surfaces=2) ! queue leaky=upstream ! nvstreammux.sink_0
(uri1) uridecodebin ! nvv4l2decoder(low-latency=1, disable-dpb=1, num-extra-surfaces=2) ! queue leaky=upstream ! nvstreammux.sink_1
(uri2) uridecodebin ! nvv4l2decoder(low-latency=1, disable-dpb=1, num-extra-surfaces=2) ! queue leaky=upstream ! nvstreammux.sink_2

nvstreammux (live-source=1, sync-inputs=0, frame-num-latest=1, batched-push-timeout=5000)
  → queue leaky=downstream
  → nvinfer (PGIE)
  → queue leaky=downstream
  → nvstreamdemux

[per sid branch]
nvstreamdemux.src_%u → queue leaky=downstream → nvvideoconvert → caps RGBA(NVMM) → nvdsosd (GPU) → queue leaky=downstream → nvcompositor.sink_%u

nvcompositor
  → capsfilter (RGBA, 1920x1080)
  → queue leaky=downstream
  → nveglglessink (sync=false)

Minimal single-source

uridecodebin uri=rtsp://... (UDP, latency=60, drop-on-latency=true)
! nvv4l2decoder enable-max-performance=1 low-latency=1 disable-dpb=1 num-extra-surfaces=2
! nvstreammux batch-size=1 live-source=1 sync-inputs=0 frame-num-latest=1 batched-push-timeout=5000 width=1920 height=1080
! nvinfer config-file-path=dstest1_pgie_config.txt
! nvdsosd process-mode=0 display-text=1
! nveglglessink sync=false

GUI with Bottom Camera with Latency

Some Questions

EGL crash after ~7 hours
I hit nvbufsurface: Failed to create EGLImage and nvdsosd: Unable to draw shapes by GPU (process-mode=0).
• Is this a known issue for certain JetPack/DeepStream combos? Any patches/workarounds?
• Best practices to avoid EGLImage exhaustion (pool sizes, caps, element ordering, nvegltransform before sink)?
• Recommended defaults for nvv4l2decoder num-extra-surfaces and nvcompositor output-buffers/max-buffers for 3×1080p@30?
CPU readback from compositor
Every ~3s I map the compositor’s RGBA surface to run an ArUco check (CPU).
• Is periodic nvbufsurface_map → get_nvds_buf_surface → unmap safe long-term, or should I tee → appsink for CPU work instead?
• Preferred memory types to avoid EGL conflicts on Jetson?
Low-latency decoder settings
Using low-latency=true and disable-dpb=true.
• Is disable-dpb safe with possible B-frames for multi-hour runs? Any recommended decoder settings for “low latency but stable”?
Mux/demux & queues (live RTSP)
Running nvstreammux live-source=1 sync-inputs=0 frame-num-latest=1, leaky queues (1 buffer).
• Any better-known settings for batched-push-timeout/queues to prevent backpressure/resource creep over hours?
Display FPS reads
disp_fps near the sink shows 0.00 while nvinfer stays ~30.
• Where should I probe to measure actual displayed FPS? Any interactions with nveglglessink sync=false and rtspsrc drop-on-latency that explain this?
Isolation steps
• If switching nvdsosd to process-mode=1 (CPU) for a few hours prevents the crash, is it fair to suspect the GPU/EGL path?
• Would keeping NV12 until OSD (instead of RGBA earlier) reduce EGL pressure? Is inserting nvegltransform advisable?

Thank you so much for your time! I really appreacite your help! :)

Fiona.Chen · November 10, 2025, 7:59am

renatomaynardetchepare:

uridecodebin uri=rtsp://... (UDP, latency=60, drop-on-latency=true)
! nvv4l2decoder enable-max-performance=1 low-latency=1 disable-dpb=1 num-extra-surfaces=2
! nvstreammux batch-size=1 live-source=1 sync-inputs=0 frame-num-latest=1 batched-push-timeout=5000 width=1920 height=1080
! nvinfer config-file-path=dstest1_pgie_config.txt
! nvdsosd process-mode=0 display-text=1
! nveglglessink sync=false

Do you mean you can reproduce the crash with this pipeline?

renatomaynardetchepare · November 10, 2025, 1:53pm

Hi Fiona,

Thank you for your answer, Yes! :), this was the setting I am using:). This crash happens after some hours/ but the increase in latency happens after 5-10 min, really weird. Thank you so much for you time!

Fiona.Chen · November 11, 2025, 1:54am

Can you refer to the DeepStream SDK FAQ - Intelligent Video Analytics / DeepStream SDK - NVIDIA Developer Forums to check whether the nvstreammux parameters and properties are set correctly according to your streams before we start the discussion of your issue?

Fiona.Chen · November 12, 2025, 8:45am

What do you mean by this?

Fiona.Chen · November 12, 2025, 8:47am

“disable-dpb” does not work with B frames, the B frames needs a large buffering queue to guarantee the reference frames are available in the buffering queue.

renatomaynardetchepare · November 12, 2025, 6:26pm

Thanks Fiona, I will do it and let you know how it goes! :)

renatomaynardetchepare · November 12, 2025, 6:27pm

Hi Fiona,

My understanding is that “surface mapping inside a pad probe” refers to calling NvBufSurfaceMap/UnMap (plus NvBufSurfaceSyncForCpu) on an NvBufSurface inside the probe callback so the CPU can read pixels. That can work, but it’s easy to get subtle issues if the map flags are wrong, if UnMap is missed on an uncommon path, or if the probe touches a surface whose lifetime/format isn’t what the code assumes. Those tend to show up as rare stalls or crashes under load.

Thanks! :)

(I am doing this because I need to read some Aruco Markers).

Fiona.Chen · November 17, 2025, 6:40am

The probe function is a blocking function, if you do the NvBufSurfaceMap/UnMap correctly inside the probe function, there is no problem.

renatomaynardetchepare · November 20, 2025, 11:50pm

Thank you for your comment, Fiona. After reviewing my code again and again, I realized I was having a problem with the probe function, and after some time running the model I was accumulating memory and eventually crashing the RAM. Additionally, I changed my pipeline to use independent cameras, each with its own engine, and now I haven’t had any problems anymore. Thank you so much for your help! 😊

Topic		Replies	Views
Deepstream-app crash with nvbufsurface: NvBufSurfaceSysToHWCopy error DeepStream SDK nvbugs	27	4889	September 1, 2021
Pipeline Crash DeepStream SDK deepstream	11	259	December 30, 2025
Jetson NX DeepStream problem DeepStream SDK	11	492	February 1, 2024
Jetson AGX Orin (DeepStream 7.1) – Smart Record / custom recording triggers NvBufSurface CUDA faults and NVENC crashes with 13 RTSP sources DeepStream SDK tensorrt , cuda , ubuntu , gstreamer , deepstream , jetson-orin	22	508	December 17, 2025
[deepstream-test.py] Error: Could not map EglImage from NvBufSurface for nvinfer on headless orin DeepStream SDK jetson-inference	6	1210	December 27, 2023
DeepStream. Jittered output DeepStream SDK camera , deepstream	11	1228	May 2, 2023
An understanding of the delay result produced by latency_measurement_buf_probe DeepStream SDK camera , cudnn , deepstream	39	609	December 24, 2024
Pipeline freeze? DeepStream SDK	14	992	May 27, 2024
Pixelated Frames in Deepstream Pipeline DeepStream SDK	19	554	September 30, 2024
Deepstream 8 Failure DeepStream SDK deepstream	3	77	January 30, 2026