Please provide complete information as applicable to your setup.
• Hardware Platform: GPU
• DeepStream Version: 8.0
• JetPack Version (valid for Jetson only): Not use
• TensorRT Version: Latest
• NVIDIA GPU Driver Version (valid for GPU only): Latest
• Issue Type( questions, new requirements, bugs): Question
• How to reproduce the issue ? (This is for bugs. Including which sample app is using, the configuration files content, the command line used and other details for reproducing)
• Requirement details( This is for new requirement. Including the module name-for which plugin or for which sample application, the function description)
🟢 [Question] How to reduce CPU bottleneck when handling RTSP streams before DeepStream decoding?
Hi everyone,
I’m currently working on an AI surveillance project.
My current hardware setup includes:
-
1× Rack Server
-
CPU: 2 × Intel® Xeon® Silver 4510 (12C/24T, 2.4GHz Base / 4.1GHz Boost)
-
RAM: 128GB DDR5 (2 × 64GB)
-
GPU: NVIDIA RTX A6000 (48GB)
-
I’m receiving RTSP streams from 7.4MP IP cameras, and planning to run video analytics (YOLO, etc.) using NVIDIA DeepStream for decoding + inference on GPU.
However, my engineer mentioned that the CPU can only handle up to 6 RTSP streams, after which they are passed to DeepStream for GPU decoding.
This seems like an unusual bottleneck. Given that decoding is being handled by the GPU via NVDEC, I was expecting the CPU load to be minimal, and to be able to scale to dozens of streams with this setup.
❓ My Questions:
-
Has anyone experienced a similar CPU bottleneck during RTSP ingestion before it reaches DeepStream/NVDEC?
-
Are there recommended ways to offload RTSP parsing / buffering / demuxing to GPU or another optimized path?
-
Is there a better way to architect the pipeline to let GPU handle more of the preprocessing (RTSP input → decode → inference), without overloading CPU?
-
If we want to scale up to 100 camera streams, do we really need to buy ~15+ servers, or is there a more scalable architecture?
🎯 Goal:
I want to maximize GPU utilization and minimize CPU dependency, especially during the RTSP ingestion and pre-decode steps, so that each rack server can handle 30–50 camera streams, not just 6. And the FPS at the last can be at least 25+ for all!
Any insights or best practices would be highly appreciated. Thank you!