Understanding Deepstream Performance (Same Pipeline, two different machines)

Please provide complete information as applicable to your setup.

Machine 1
• Hardware Platform (Jetson / GPU) GPU
• DeepStream Version 6.0
• TensorRT Version 8.2.3.0
• NVIDIA GPU Driver Version (valid for GPU only) 470
**• GPU RTX 3070 PCIE v4 x 8
**• Processor Intel® Core™ i7-11700F Processor (Alienware Aurora R12)
**• Total number of cameras running the pipeline: 60 cameras @ 15 FPS
**• Max GPU Utilization Observed: 90+

Machine 2
• Hardware Platform (Jetson / GPU) GPU
• DeepStream Version 6.0
• TensorRT Version 8.2.3.0
• NVIDIA GPU Driver Version (valid for GPU only) 470
**• GPU RTX 3070 PCIE v3 x 16
**• Processor Intel® Xeon® Gold 6248R Processor
**• Total number of cameras running the pipeline: 40 cameras @ 15 FPS
**• Max GPU Utilization Observed: ~60

• Issue Type( questions, new requirements, bugs) question

Simply put, I have the exact same custom deepstream analytics pipeline running on both machines mentioned above. Machine 1 outperformed Machine 2 by a hefty amount of cameras. How can we figure out the cause of this?

Best regards,
Eslam

You may monitor the GPU loading nd CPU loading when running the case to find out the performance bottleneck.

Machine 1 used up to 4 vCores whereas machine 2 used up to ~3.5 vCores. I understand the both machines have the PCIE bandwidth, but could the PCIE version is the cause of this?

We are not sure. Can you also provide the information of DDR and your detailed pipeline?

Have you measured the GPU loading and CPU loading when running the case in the two machines? You can use “nvidia-smi dmon” to monitor the GPU status.