Hi team,
I’m currently working on an NVIDIA L40 server with the following specs:
- GPU: NVIDIA L40 with 32 GB VRAM
- CPU: 32-core processor
- Software: DeepStream SDK (Python-based pipelines) on Ubuntu
- ** Deepstream 6.4**
✅ Use Case
We are building real-time video analytics pipelines using NVIDIA DeepStream for different use cases like:
- People detection
- Vehicle detection
- Helmet compliance
- Face monitoring
…all from the same camera feed.
So far, things are working well. But now I’ve hit a major architectural constraint:
⚠️ Challenge
Previously, we were using the same RTSP camera stream multiple times across different pipelines (e.g., once in a vehicle pipeline, once in a helmet pipeline).
However, I am now required to:
- Access each camera stream only once (due to network policies and camera load)
- Run multiple types of inference from the same stream
- Keep each use case modular and separate, ideally in different DeepStream pipelines or servers
🔍 What I’m Exploring
To solve this, I am thinking of implementing:
- A streaming server that pulls each RTSP stream only once
- It then distributes the decoded frames to multiple DeepStream pipelines (for people, vehicle, helmet, etc.)
- Each pipeline performs its own inference independently
I would love some guidance on:
- Best practices to share decoded frames across pipelines
- Whether using
appsrc+nvstreammuxin receiving pipelines is the right approach - Efficient inter-process transport (e.g., ZeroMQ, shared memory, or nvbuf sharing?)
- Any NVIDIA-recommended architecture for this use case
Can someone from the NVIDIA team or community guide me:
- On how to architect this correctly?
- If NVIDIA provides utilities or SDK features to help distribute decoded frames to multiple DeepStream pipelines efficiently?
- How to maintain batching and inference acceleration while avoiding redundant decoding?
Thanks in advance! Any sample references or architecture diagrams would be highly appreciated.