How to increase GPU utilization?

fre_deric · May 7, 2021, 11:21am

• Hardware Platform (Jetson / GPU): RTX 3090 24GB
• DeepStream Version: 5.1
• TensorRT Version: 7.2.3-1+cuda11.1
• NVIDIA GPU Driver Version (valid for GPU only): 460.32.03
• Issue Type( questions, new requirements, bugs)> questions

yolov4-tiny

I test a performance on yolov4-tiny model with deepstream-app on my RTX 3090. When I use engine model with batch-size=4 with 4 input streams, the FPS for each stream is around 210FPS and GPU utilization is around 30%. So it means that I can get much more.

If I create new engine model of yolov4-tiny with batch-size=8 and test it with 8 input streams, the FPS for each stream is around 105FPS (It is a half of the previous case.) and GPU utilization is still around 30%.

Next, instead of larger batch I run two deepstream-apps with batch-size=4 engine from the first case and with 4 input streams for both deepstram-apps, but again the FPS for each stream was around 105FPS and GPU utilization is still around 30%.

yolov4-mish

All this I did again for yolov4-mish (which is larger than yolov4-tiny).

Shortly:
For batch-size=4, input-streams=4, FPS=~28 for each stream, GPU-util=~45%
For batch-size=8, input-streams=8, FPS=~14 for each stream, GPU-util=~45%
For two deepstream apps, batch-size=4, input-streams=4 for both apps, FPS=~28 for each stream for both apps!!!, GPU-util=~98%

How can I force the GPU to use all available performance? In case of yolov4-mish, why two deepstream-apps with batch-size=4 engine are able to use 99% GPU perf? But one deepstream-app with batch-size=8 engine is not?

(Both engines were created with fp16 precision)

mchi · May 10, 2021, 2:51am

suspect the bottleneck is on decoder, please run below command and share the output

$ nvidia-smi dmon

fre_deric · May 10, 2021, 11:38am

yes, you are right, thank you!
For yolov4-tiny model with batch-size=4 with 4 input streams, where the FPS for each stream is around 210FPS, decoder is utilized for 100%:
Screenshot from 2021-05-10 12-37-12

Is there any way how to boost the decoder? Or I have to buy GPU with more powerful decoder?

mchi · May 12, 2021, 2:06am

you can refer to " NVDEC - Hardware-Accelerated Video Decoding" section in NVIDIA VIDEO CODEC SDK | NVIDIA Developer

Topic		Replies	Views
GPU usage is high and frame rate is very low DeepStream SDK	6	338	November 29, 2022
Low GPU utilization DeepStream SDK tensorrt , cuda , ubuntu , gstreamer , deepstream	18	1221	June 8, 2023
Why the fps is not crossing 35 even though free GPU space available DeepStream SDK	16	744	October 12, 2021
Maximum fps on a dgpu? DeepStream SDK	5	479	July 7, 2022
Increase GPU utilization for more FPS and streams DeepStream SDK	2	373	October 12, 2021
How to adjust the paramerters to acclearte the yolov7 on deepstream? I got fps 8, i think it must be happened something wrong when i did DeepStream SDK	14	752	December 5, 2022
Performance Issue on DeepStream with RTX 4090 – Low GPU Utilization DeepStream SDK cuda , gstreamer , performance , deepstream	4	70	November 1, 2024
Drop in FPS when adding more streams in DeepStream, and GPU utilization not exceeding 30% cuDNN tensorrt , opencv , cuda , tensorflow , python , deepstream	0	26	July 19, 2024
Deepstream yolov4 process multiple streams is slow DeepStream SDK	7	1368	November 30, 2021
How to increase GPU utilization for reference application DeepStream SDK performance	4	769	October 12, 2021

How to increase GPU utilization?

yolov4-tiny

yolov4-mish

Related topics