Does DeepStream support reliable pre-decode recording at scale (60+ RTSP sources) inside nvurisourcebin?

• Hardware Platform GPU
• DeepStream Version 8.0
• NVIDIA GPU : RTX 5090

Pipeline description (DeepStream)

I am running a DeepStream multi-camera pipeline with pre-decode recording inside nvurisourcebin.

Main inference pipeline

nvurisourcebin (x N sources)
    → nvstreammux
        → nvstreamdemux
            → nvinfer
                → nvmetamux

  • nvstreammux batches all sources

  • nvstreamdemux splits per-stream

  • Inference and metadata aggregation work correctly even with 60 cameras

Recording branch (inside nvurisourcebin)

Inside each nvurisourcebin, I added a pre-decode tee to record the original RTSP stream:

rtspsrc
    → tee_rtsp_pre_decode
        ├── queue
        │     → h264parse
        │          → splitmuxsink   (record original stream)
        │
        └── decode
              → nvstreammux (main pipeline)
  1. Recording is done before decode
  2. splitmuxsink is used to record video
  3. The same recording pipeline works perfectly when run as a standalone GStreamer pipeline (60 cameras OK)

I also set property of Queue element to no limit.

  • queue.set_property(“max-size-buffers”, 0)
  • queue.set_property(“max-size-time”, 0)
  • queue.set_property(“max-size-bytes”, 0)

Observed issue

  • With 30 cameras:

    • Recording works correctly

    • No frame drop or freeze

  • With 60 cameras:

    • Recording branch stalls

    • Output video files stop growing

    • Missing segments / frozen recordings

    • Inference pipeline continues to run normally

Question

  1. Is this a known limitation when adding a pre-decode recording branch inside nvurisourcebin at high source counts?

  2. Can backpressure from nvstreammux / global DeepStream clock propagate upstream and affect the pre-decode tee branch?

  3. Is there a recommended architecture for large-scale pre-decode recording (e.g. using appsink, external recorder pipeline, or a separate process)?

  4. Are there specific properties in nvurisourcebin / rtspsrc that should be set to avoid this behavior at scale?

The pipeline is wrong. We already provided sample with nvmetamux in deepstream_reference_apps/deepstream_parallel_inference_app at master · NVIDIA-AI-IOT/deepstream_reference_apps

The nvurisrcbin has smart record function, it will record pre-decode stream automatically.

What do you mean by “The same recording pipeline”? What do you mean by “run as a standalone GStreamer pipeline”?

Thank you for your response.

This is my pipeline. For simplicity, I am using a single RTSP input to make it easier to read (in reality, I am running 60 RTSP streams). My pipeline is based on the DeepStream parallel pipeline code that you shared.

The main change I made is that I want to record video by adding a splitmuxsink element. Could you please take a look at my pipeline and see if there are any issues with it?

The nvurisrcbin has smart record function, it will record pre-decode stream automatically.

I’m aware that nvurisrcbin has the Smart Record feature, which can automatically record the pre-decode stream.However, instead of storing the data in memory or recording selectively, I want to persist all video streams to storage (i.e., continuously save all video data).

What do you mean by “The same recording pipeline”? What do you mean by “run as a standalone GStreamer pipeline”?

By “the same recording pipeline”, I mean using exactly the same GStreamer elements and structure for recording, but outside of DeepStream.

By “run as a standalone GStreamer pipeline”, I mean that I create an independent GStreamer pipeline whose only purpose is to record video, without any DeepStream elements involved.

In my case, this standalone pipeline works perfectly with no issues. The pipeline is simple: I create 60 bins for 60 cameras, and each bin contains the following elements:

rtspsrc
 → rtph264depay
 → h264parse
 → queue
 → splitmuxsink

This pipeline records video correctly and runs stably.

The problem only appears when I try to integrate a similar recording logic into my DeepStream-based pipeline.

Can you use “nvidia-smi dmon” command to measure the GPU/CPU loading when running the case with 60 sources?

Smart recording does store the videos to storage.

Do you mean just run a rtspsrc+splitmuxsink pipeline?

My test results with different pipelines

I have tested several pipelines, and the results are as follows:

1. Simple RTSP recording pipeline

Pipeline: rtspsrc + splitmuxsink

  • Video recording works correctly

  • No freezing, no frame drops, no corrupted output


2. Pipeline without using nvurisrcbin

In this case, I implemented the RTSP source part manually. The pipeline structure is shown below:

rtspsrc → depay → parser → tee
                     ├── queue → decode → nvstreammux → ...
                     └── queue → splitmuxsink

  • The recorded video is correct

  • No freezing or missing frames

  • Recording the original stream before decode works well


3. Pipeline using nvurisrcbin with an additional branch at tee_rtsp_pre_decode

nvurisrcbin
    → tee_rtsp_pre_decode
        ├── queue
        │     → h264parse
        │          → splitmuxsink   (record original stream)
        │
        └── decode
              → nvstreammux (main pipeline)

  • In this case, the recorded video frequently freezes

  • Some parts of the stream are missing or corrupted

  • Even when no inference model is used (the pipeline only passes through nvstreammux), the recorded video still has the same issue.


Question / Discussion

I would like to continue using nvurisrcbin and avoid re-implementing my own RTSP source bin, because nvurisrcbin provides useful features such as:

  • RTSP timeout handling

  • Automatic reconnect

However, when I add a recording branch at tee_rtsp_pre_decode, the recorded video becomes unstable.

I am not sure whether there is any internal limitation, bottleneck, or buffer constraint inside nvurisrcbin that could cause this behavior.

Could you please suggest:

  • Possible reasons why recording from tee_rtsp_pre_decode causes freezing?

  • Any known limitations of nvurisrcbin in this use case?

  • Recommended ways to debug or instrument nvurisrcbin (e.g., latency, buffering, queue configuration, or internal flow control)?

Any guidance would be greatly appreciated.

Thank you.

My Pipeline without using nvurisrcbin

The nvurisrcbin is open source, you may check with the source code /opt/nvidia/deepstream/deepstream/sources/gst-plugins/gst-nvurisrcbin. There are lots of elements in this bin, and the properties settings may not be aligned to your settings with separated pipeline.

Thank you. I fixed my issued.