When processing multiple RTSP stream sources using DeepStream, are there performance recommendations for using a single Gstreamer / Deepstream pipeline to process multiple streams (assume same inference model, no tracking for all streams) or multiple gstreamer pipelines - one per stream?
In this tech blog it sounds like there is a separate thread per camera with its own gstreamer pipeline. For example, 1 camera gets 1 gstream pipeline that may have a nvmstreammux but otherwise the pipeline only processes that camera stream.
Fyma’s current implementation runs a master process for each GPU instance. This master process in turn runs a GStreamer pipeline for each video stream added to the platform. Memory overhead for each camera is low since everything runs in a single process.
Each plug-in can have one or more source and sink pads. In this case, when the streams are added, the Gst-Uridecodebin plug-in gets added to the pipeline, one for each stream. The source component from each Gst-Uridecodebin plug-in is connected to each sink component on the single Nv-streammux plug-in. Nv-streammux creates batches from the frames coming from all previous plug-ins and pushes them to the next plug-in in the pipeline.
In what cases would the pipeline per stream be a better or more performant option rather than one pipeline handling multiple streams?
The decision depends on your requirement and the platform capability.
For your case, there are large number of live streams(rtsp streams from cameras) and you want them be inferenced by one model.
But what is your platform? Do you have multiple GPUs or only one GPU? What is the model’s loading and performance? We will recommend you to use one pipeline for multiple streams instead of one pipeline for one stream if a single GPU can handle the whole pipeline. If the loading exceeds the single GPU’s capability, we may suggest you to separate the streams to different pipelines running in different GPUs.
But what is your platform? Do you have multiple GPUs or only one GPU?
Multiple GPUs with the workload across 1 VM per GPU.
What is the model’s loading and performance?
Pytorch Yolov5s model. It’s not run under nvinfer but inside a probe and is using CPU. Performance is acceptable and we’re not seeing any GPU or CPU utilization issues, we’re decoder bounder at this point, we plan to move this inside nvinfer with a TensorRT exported model.
We will recommend you to use one pipeline for multiple streams instead of one pipeline for one stream if a single GPU can handle the whole pipeline. If the loading exceeds the single GPU’s capability, we may suggest you to separate the streams to different pipelines running in different GPUs.
One approach I’m looking at is to distribute “batches” of streams inside containers across multiple GPU accelerated VMs. Just as an example we could have 1000 streams across 10 VMs, and then 10 container instances each processing 10 streams per VM. In this example, I wondered if each container should have a gstreamer pipeline per stream or 10 streams hooked up to a single gstreamer pipeline per container from a performance perspective.
Late follow up but is there general guidance on the number of streams on a single nvstreammux? For example 1000 cameras per GPU, I’m assuming 10 threads or 10 containers of 100 streams would still work on a single gstreamer pipeline but maybe not 1000 cameras on a single gstreamer pipeline?
There is no update from you for a period, assuming this is not an issue anymore. Hence we are closing this topic. If need further support, please open a new one. Thanks
The number of cameras supported in one GPU depends on the application loading. The resolution of the videos(4k resolution may have much more loading than 1080p videos), the size and complexity of the models(if the model runs on GPU), the format conversion needed in the whole pipeline, … etc.