**TL;DR
Hardware Platform (Jetson / GPU)**
Jetson AGX Orin Dev Kit
DeepStream Version
7.1
JetPack Version (valid for Jetson only)
6.0 (L4T 36.3.0)
TensorRT Version
10.7.0.23
NVIDIA GPU Driver Version (valid for GPU only)
N/A (Jetson)
Issue Type (questions, new requirements, bugs)
Bug / stability — long-run crash after ~7 hours: nvbufsurface: Failed to create EGLImage (GPU OSD/compositor path)
How to reproduce the issue?
Three RTSP (H.264, 1080p30) → nvstreammux → nvinfer → nvstreamdemux → per-stream nvdsosd(process-mode=0) → nvcompositor → nveglglessink.
Decoder tuned via num-extra-surfaces=2, enable-max-performance=1, low-latency=1, tested disable-dpb=true/false. RTSP tuned with latency=60, drop-on-latency=true, UDP.
I also map the compositor RGBA surface briefly every ~3 s in a pad-probe for a small ArUco check, then unmap immediately. I am using Python**
**
Hey everyone!:) Hope you’re doing well — I’m chasing a long-run DeepStream stability issue and would really appreciate your guidance :(. Jetson app with 3 RTSP H.264 cameras batched through nvstreammux → nvinfer → nvstreamdemux → per-branch nvdsosd → nvcompositor → sink. Two sources remain ~25–35 ms end-to-end, but sid=0 slowly drifts to ~60–70 ms while inference stays ~30 FPS on all. Additionally, after ~7 hours the process crashes (no clean EOS). Looking for guidance on per-source latency drift and the long-run crash. I’m setting nvv4l2decoder.disable-dpb=true.
Symptom & measurements (latency drift on sid=0)
All streams decode/infer ~30 FPS. Display-side latency for sid=0 grows over time; others stay flat:
[PERF 14:23:05] window=60s
sid=0: disp_fps=0.00 infer_fps=29.25 lat_ms(avg/p50/p95)=29.0/28.4/41.9 n=108
sid=1: disp_fps=0.00 infer_fps=29.89 lat_ms(avg/p50/p95)=35.5/29.5/51.7 n=124
sid=2: disp_fps=0.00 infer_fps=30.07 lat_ms(avg/p50/p95)=28.6/27.6/41.6 n=103
...
[PERF 14:31:10] window=60s
sid=0: disp_fps=0.00 infer_fps=30.51 lat_ms(avg/p50/p95)=87.6/105.3/126.6 n=1437
sid=1: disp_fps=0.00 infer_fps=30.22 lat_ms(avg/p50/p95)=27.1/26.4/39.0 n=1753
sid=2: disp_fps=0.00 infer_fps=29.46 lat_ms(avg/p50/p95)=26.5/25.6/39.0 n=1756
New issue: long-run crash after ~7 hours
After running stably for several hours, the process exits/crashes around the 7-hour mark. There’s no deliberate shutdown and no EOS; sometimes no helpful bus error precedes it. I added a tiny uptime log and bus handler—when it happens again I can attach exact stderr/bus messages, but presently it looks like an abrupt abort/segfault rather than a handled error.
Full crash log (as captured)
nvbufsurface: Failed to create EGLImage. nvbufsurface: Failed to create EGLImage. nvbufsurface: Failed to create EGLImage. nvbufsurface: Failed to create EGLImage.
ERROR GStreamer: gst-resource-error-quark: Unable to draw shapes onto video frame by GPU (1) /dvs/git/dirty/git-master_linux/deepstream/sdk/src/gst-plugins/gst-nvdsosd/gstnvdsosd.c(645): gst_nvds_osd_transform_ip (): /GstPipeline:secview-batched/GstNvDsOsd:osd_2 [Main] Limpiando recursos... libnvosd (1375):(ERROR) : Unable to map EGL Imagenvbufsurface: Failed to create EGLImage. 7:19:23.123390839 31570 0xaaaae7430a40 WARN nvinfer gstnvinfer.cpp:2423:gst_nvinfer_output_loop:<pgie> error: Internal data stream error. 7:19:23.123481178 31570 0xaaaae7430a40 WARN nvinfer gstnvinfer.cpp:2423:gst_nvinfer_output_loop:<pgie> error: streaming stopped, reason error (-5) (FinalVersionBatch3v3.0.py:31570): GStreamer-CRITICAL **: 22:30:51.358: gst_mini_object_unref: assertion '(g_atomic_int_get (&mini_object->lockstate) & LOCK_MASK) < 4' failed libnvosd (1375):(ERROR) : Unable to map EGL Imagenvbufsurface: Failed to create EGLImage. (FinalVersionBatch3v3.0.py:31570): GStreamer-CRITICAL **: 22:30:51.391: gst_mini_object_unref: assertion '(g_atomic_int_get (&mini_object->lockstate) & LOCK_MASK) < 4' failed (FinalVersionBatch3v3.0.py:31570): GStreamer-CRITICAL **: 22:30:51.401: gst_mini_object_unref: assertion '(g_atomic_int_get (&mini_object->lockstate) & LOCK_MASK) < 4' failed nvbufsurface: Failed to create EGLImage. (FinalVersionBatch3v3.0.py:31570): GStreamer-CRITICAL **: 22:30:51.415: gst_mini_object_unref: assertion '(g_atomic_int_get (&mini_object->lockstate) & LOCK_MASK) < 4' failed (FinalVersionBatch3v3.0.py:31570): GStreamer-CRITICAL **: 22:30:51.416: gst_mini_object_unref: assertion '(g_atomic_int_get (&mini_object->lockstate) & LOCK_MASK) < 4' failed [Main] exit complete [UPTIME] 2025-11-08T15:11:28.202001 -> 2025-11-08T22:30:51.836782 uptime=07:19:23 (26363.635s) reason=atexit
Suspicions
-
Surface mapping inside a pad probe (RGBA readback): I briefly map/unmap surfaces in a compositor-src probe to drive an ArUco-based layout switch. If the
nvbufsurface_map/unmapusage is off for a given DeepStream/pyds build, it could be a slow-burn until a rare code path segfaults.
Is mapping vianvbufsurface_map(batch_meta.batch_meta, …)safe here, or should I avoid explicit map/unmap and rely only onpyds.get_nvds_buf_surface()(or move this to a tee→appsink CPU branch)? -
nvv4l2decoder.disable-dpb=truewith B-frames: If sid=0 camera uses B-frames or longer reference chains, DPB disabling might cause jitter/instability and eventually trigger a downstream edge case. -
Small buffer pools under long-run pressure:
nvcompositorand decoders have small pools (output-buffers~4,num-extra-surfaces=2). Rare bursts or resizes could exhaust a pool and trip an internal assert. -
Display meta pool usage: I now attach one DisplayMeta per frame and avoid persistent refs; still, if the pool runs dry and add/remove paths disagree, it could crash (though I try to “safe-add” and release on error).
Key code parts
1) RTSP + Decoder low-latency setup (child-added)
def decodebin_child_added(child_proxy, Object, name, user_data):
if "decodebin" in name:
Object.connect("child-added", decodebin_child_added, user_data)
# RTSP: low-latency, UDP, drop late; NTP off (cams unsynced)
if "rtspsrc" in name or "source" in name:
Object.set_property("latency", 60)
Object.set_property("drop-on-latency", True)
Object.set_property("do-rtsp-keep-alive", True)
Object.set_property("protocols", 0x1) # UDP
Object.set_property("tcp-timeout", 2_000_000_000)
Object.set_property("timeout", 4_000_000_000)
try: Object.set_property("ntp-sync", False)
except Exception: pass
# Decoder NV: low-latency profile
if "nvv4l2decoder" in name:
Object.set_property("num-extra-surfaces", 2)
Object.set_property("enable-max-performance", True)
try: Object.set_property("low-latency", True)
except Exception: pass
try: Object.set_property("disable-dpb", True) # <-- questionably safe if B-frames
except Exception: pass
- Full pipeline build (mux → infer → demux → per-branch → compositor)
mux.set_property("batch-size", len(uris))
mux.set_property("width", 1920); mux.set_property("height", 1080)
mux.set_property("live-source", 1)
mux.set_property("sync-inputs", 0)
try: mux.set_property("frame-num-latest", 1)
except Exception: pass
mux.set_property("batched-push-timeout", 5000) # 5 ms
# after-PGIE queue is leaky to avoid backpressure deadlocks
q_after_pgie.set_property("leaky", 2)
q_after_pgie.set_property("max-size-buffers", 1)
# demux to branches → RGBA (NVMM) → OSD (GPU) → queue → nvcompositor
comp_sinkpad = compositor.request_pad_simple("sink_%u")
q.get_static_pad("src").link(comp_sinkpad)
# instrumentation
src = pgie.get_static_pad("src")
src.add_probe(Gst.PadProbeType.BUFFER, pgie_src_probe, None) # infer FPS
comp_sinkpad.add_probe(Gst.PadProbeType.BUFFER, fps_probe_for(idx)) # display FPS
- Potentially related: reading RGBA for ArUco inside a probe
def probe_rgba_canvas(pad, info, _u):
CHECK_EVERY_SEC = 3
buf = info.get_buffer()
if not buf: return Gst.PadProbeReturn.OK
# ... omitted: cadence + layout checks ...
batch_meta = pyds.gst_buffer_get_nvds_batch_meta(hash(buf))
if not batch_meta: return Gst.PadProbeReturn.OK
try:
mapped = False
try:
# WARNING: Is this safe for all DS/pyds builds?
pyds.nvbufsurface_map(batch_meta.batch_meta, frame_meta.batch_id, 0,
pyds.NvBufSurfaceMemType_NVBUF_MEM_DEFAULT)
mapped = True
except Exception:
mapped = False
rgba_flat = pyds.get_nvds_buf_surface(hash(buf), frame_meta.batch_id)
if rgba_flat is None or int(getattr(rgba_flat, "size", 0)) == 0:
if mapped:
pyds.nvbufsurface_unmap(batch_meta.batch_meta, frame_meta.batch_id, 0)
return Gst.PadProbeReturn.OK
# reshape, downscale, run ArUco, then toggle layout accordingly
# ...
finally:
if 'mapped' in locals() and mapped:
try: pyds.nvbufsurface_unmap(batch_meta.batch_meta, frame_meta.batch_id, 0)
except Exception: pass
return Gst.PadProbeReturn.OK
Pipelines
Full (conceptual) multi-source pipeline
(uri0) uridecodebin ! nvv4l2decoder(low-latency=1, disable-dpb=1, num-extra-surfaces=2) ! queue leaky=upstream ! nvstreammux.sink_0
(uri1) uridecodebin ! nvv4l2decoder(low-latency=1, disable-dpb=1, num-extra-surfaces=2) ! queue leaky=upstream ! nvstreammux.sink_1
(uri2) uridecodebin ! nvv4l2decoder(low-latency=1, disable-dpb=1, num-extra-surfaces=2) ! queue leaky=upstream ! nvstreammux.sink_2
nvstreammux (live-source=1, sync-inputs=0, frame-num-latest=1, batched-push-timeout=5000)
→ queue leaky=downstream
→ nvinfer (PGIE)
→ queue leaky=downstream
→ nvstreamdemux
[per sid branch]
nvstreamdemux.src_%u → queue leaky=downstream → nvvideoconvert → caps RGBA(NVMM) → nvdsosd (GPU) → queue leaky=downstream → nvcompositor.sink_%u
nvcompositor
→ capsfilter (RGBA, 1920x1080)
→ queue leaky=downstream
→ nveglglessink (sync=false)
Minimal single-source
uridecodebin uri=rtsp://... (UDP, latency=60, drop-on-latency=true)
! nvv4l2decoder enable-max-performance=1 low-latency=1 disable-dpb=1 num-extra-surfaces=2
! nvstreammux batch-size=1 live-source=1 sync-inputs=0 frame-num-latest=1 batched-push-timeout=5000 width=1920 height=1080
! nvinfer config-file-path=dstest1_pgie_config.txt
! nvdsosd process-mode=0 display-text=1
! nveglglessink sync=false
GUI with Bottom Camera with Latency
Some Questions
-
EGL crash after ~7 hours
I hitnvbufsurface: Failed to create EGLImageandnvdsosd: Unable to draw shapes by GPU(process-mode=0).
• Is this a known issue for certain JetPack/DeepStream combos? Any patches/workarounds?
• Best practices to avoid EGLImage exhaustion (pool sizes, caps, element ordering,nvegltransformbefore sink)?
• Recommended defaults fornvv4l2decoder num-extra-surfacesandnvcompositor output-buffers/max-buffersfor 3×1080p@30? -
CPU readback from compositor
Every ~3s I map the compositor’s RGBA surface to run an ArUco check (CPU).
• Is periodicnvbufsurface_map → get_nvds_buf_surface → unmapsafe long-term, or should I tee → appsink for CPU work instead?
• Preferred memory types to avoid EGL conflicts on Jetson? -
Low-latency decoder settings
Usinglow-latency=trueanddisable-dpb=true.
• Isdisable-dpbsafe with possible B-frames for multi-hour runs? Any recommended decoder settings for “low latency but stable”? -
Mux/demux & queues (live RTSP)
Runningnvstreammux live-source=1 sync-inputs=0 frame-num-latest=1, leaky queues (1 buffer).
• Any better-known settings forbatched-push-timeout/queues to prevent backpressure/resource creep over hours? -
Display FPS reads
disp_fpsnear the sink shows 0.00 whilenvinferstays ~30.
• Where should I probe to measure actual displayed FPS? Any interactions withnveglglessink sync=falseandrtspsrc drop-on-latencythat explain this? -
Isolation steps
• If switchingnvdsosdtoprocess-mode=1(CPU) for a few hours prevents the crash, is it fair to suspect the GPU/EGL path?
• Would keeping NV12 until OSD (instead of RGBA earlier) reduce EGL pressure? Is insertingnvegltransformadvisable?
Thank you so much for your time! I really appreacite your help! :)
