How to increase GPU utilization?

• Hardware Platform (Jetson / GPU): RTX 3090 24GB
• DeepStream Version: 5.1
• TensorRT Version: 7.2.3-1+cuda11.1
• NVIDIA GPU Driver Version (valid for GPU only): 460.32.03
• Issue Type( questions, new requirements, bugs)> questions

yolov4-tiny

I test a performance on yolov4-tiny model with deepstream-app on my RTX 3090. When I use engine model with batch-size=4 with 4 input streams, the FPS for each stream is around 210FPS and GPU utilization is around 30%. So it means that I can get much more.

If I create new engine model of yolov4-tiny with batch-size=8 and test it with 8 input streams, the FPS for each stream is around 105FPS (It is a half of the previous case.) and GPU utilization is still around 30%.

Next, instead of larger batch I run two deepstream-apps with batch-size=4 engine from the first case and with 4 input streams for both deepstram-apps, but again the FPS for each stream was around 105FPS and GPU utilization is still around 30%.

yolov4-mish

All this I did again for yolov4-mish (which is larger than yolov4-tiny).

Shortly:
For batch-size=4, input-streams=4, FPS=~28 for each stream, GPU-util=~45%
For batch-size=8, input-streams=8, FPS=~14 for each stream, GPU-util=~45%
For two deepstream apps, batch-size=4, input-streams=4 for both apps, FPS=~28 for each stream for both apps!!!, GPU-util=~98%

How can I force the GPU to use all available performance? In case of yolov4-mish, why two deepstream-apps with batch-size=4 engine are able to use 99% GPU perf? But one deepstream-app with batch-size=8 engine is not?

(Both engines were created with fp16 precision)

suspect the bottleneck is on decoder, please run below command and share the output

$ nvidia-smi dmon

1 Like

yes, you are right, thank you!
For yolov4-tiny model with batch-size=4 with 4 input streams, where the FPS for each stream is around 210FPS, decoder is utilized for 100%:
Screenshot from 2021-05-10 12-37-12

Is there any way how to boost the decoder? Or I have to buy GPU with more powerful decoder?

you can refer to " NVDEC - Hardware-Accelerated Video Decoding" section in NVIDIA VIDEO CODEC SDK | NVIDIA Developer