Efficiently Sharing RTSP Streams Across Multiple DeepStream Pipelines for Different Analytics Use Cases

Hi team,

I’m currently working on an NVIDIA L40 server with the following specs:

  • GPU: NVIDIA L40 with 32 GB VRAM
  • CPU: 32-core processor
  • Software: DeepStream SDK (Python-based pipelines) on Ubuntu
  • ** Deepstream 6.4**

✅ Use Case

We are building real-time video analytics pipelines using NVIDIA DeepStream for different use cases like:

  • People detection
  • Vehicle detection
  • Helmet compliance
  • Face monitoring
    …all from the same camera feed.

So far, things are working well. But now I’ve hit a major architectural constraint:


⚠️ Challenge

Previously, we were using the same RTSP camera stream multiple times across different pipelines (e.g., once in a vehicle pipeline, once in a helmet pipeline).

However, I am now required to:

  • Access each camera stream only once (due to network policies and camera load)
  • Run multiple types of inference from the same stream
  • Keep each use case modular and separate, ideally in different DeepStream pipelines or servers

🔍 What I’m Exploring

To solve this, I am thinking of implementing:

  1. A streaming server that pulls each RTSP stream only once
  2. It then distributes the decoded frames to multiple DeepStream pipelines (for people, vehicle, helmet, etc.)
  3. Each pipeline performs its own inference independently

I would love some guidance on:

  • Best practices to share decoded frames across pipelines
  • Whether using appsrc + nvstreammux in receiving pipelines is the right approach
  • Efficient inter-process transport (e.g., ZeroMQ, shared memory, or nvbuf sharing?)
  • Any NVIDIA-recommended architecture for this use case

Can someone from the NVIDIA team or community guide me:

  • On how to architect this correctly?
  • If NVIDIA provides utilities or SDK features to help distribute decoded frames to multiple DeepStream pipelines efficiently?
  • How to maintain batching and inference acceleration while avoiding redundant decoding?

Thanks in advance! Any sample references or architecture diagrams would be highly appreciated.

If there is only one process, please refer to NV ready-made sample deepstream_parallel_inference_app, as the graph in the readme shown, every source is decoded only once, then the decoded frame can be sent to many inference branch with nvstreammux, tee, nvstreamdemux plugin.
If there are mnay inference pipeline, please refer to NV ready-made sample /opt/nvidia/deepstream/deepstream/sources/apps/sample_apps/deepstream-ipc-test, which only supports Jetson currently. this sample shares decoded buffers over IPC.

could you use a diagram to share what do want? such as how many devices? how many processes? what is the complete pipeline?