Please provide complete information as applicable to your setup.
• Hardware Platform (Jetson / GPU) - RTX A100 80GB
• DeepStream Version 9.0
• TensorRT Version 10.16.1
• NVIDIA GPU Driver Version (valid for GPU only) 580.126.20
• Issue Type( questions, new requirements, bugs) - Clarification on the best batch size selection.
Pipeline is roughly:
uridecodebin/rtspsrc → nvstreammux → nvinfer → nvtracker → nvvideoconvert → fakesink
Model: custom YOLO exported to TensorRT FP16
Input: 640x640
Output: [batch, 300, 6]
My goal is to keep GPU utilization stable below ~90%, not only maximize average throughput.
I tested two approaches:
- streammux batch-size=80, nvinfer batch-size=80, FP16 TensorRT engine with max batch 80
- streammux batch-size=32, nvinfer batch-size=32, FP16 TensorRT engine with max batch 32, while still connecting 80 RTSP sources
With batch 80, average throughput is good, but I see sudden GPU SM spikes. With batch 32 and interval=2, the runtime is much more stable in my tests.
My question:
For 80 live RTSP sources, is it generally better to build/use a batch-80 engine and let nvinfer process one large batch, or use a smaller batch-32 engine and let DeepStream/nvinfer process the 80 sources in smaller chunks? Is there a possibility that it will somehow “fall behind” and just inference wont keep up with decoded frames?
What are the practical advantages/disadvantages of each approach in DeepStream?
Specifically, I want to understand:
- Does nvinfer internally split larger nvstreammux batches into smaller inference chunks when nvinfer batch-size is smaller than the number of sources?
- Is using streammux batch-size=32 with 80 live sources a recommended/valid approach?
- Can smaller nvinfer batches reduce GPU utilization spikes even if average utilization is similar?
- Are there latency or frame-dropping side effects when streammux batch-size is smaller than the number of live sources?
- For live RTSP surveillance, should batch size be optimized for throughput, latency, or GPU utilization stability?
Any guidance or best practices for 80-camera DeepStream deployments would be appreciated.