Training a gan on the RTX 30370 with only ~5% utilization

tobiasmueller14 · February 11, 2021, 7:33am

hello, I am training a gan with 15k binary images (256, 256).

I am using pytorch 1.7.1, cuda 11 and AMP in the code. The estimated training time is ~9h on the RTX3070.

But why is the utilization of the GPU smaller 10% and my CPU at 70%?

Also the dedicated GPU memory of 5.9/8GB is plausible.

On an older PC with the GTX 1080 Ti the training takes 18h but with less CPU utilization.

Robert_Crovella · February 11, 2021, 2:50pm

It stands to reason that all other things being equal, if you do the same amount of work in less time on the GPU, then the corresponding CPU utilization will go up.

HfCloud · February 11, 2021, 3:15pm

Use nvidia-smi instead. (Execuate nvidia-smi in command prompt)

njuffa · February 11, 2021, 6:04pm

While nvidia-smi is OK for utilization monitoring, for continuous monitoring on Windows I would recommend TechPowerUp’s GPU-Z, which is a free download.

Robert Crovella’s hypothesis seems plausible to me. You might also want to check with the makers of pytorch or inspect the source code, and run with the CUDA profiler. I suspect a contributing factor may be the small size of the individual images which could lead to extremely short kernel run times leading to increased exposure to kernel launch overhead.

Historically, poor GPU utilization has been observed in software that tried to balance CPU and GPU work when it was first created. A decade later, as GPU performance had increased much more rapidly than CPU performance, such software often became bottlenecked on the CPU portion of the code. Other software that focused on maximizing the amount of work done on the GPU from the start (even when the efficiency of some of the code when running on the GPU was rather poor) scaled better in the long run. I don’t have experience with pytorch so cannot say which of these categories it falls into.

Even software that is aggressively parallelized with CUDA contains serial portions that run on the CPU. HPC systems with high-end GPUs should therefore utilize CPUs with high single-thread performance to avoid the serial portions of the application becoming a bottleneck. To first order this means using CPUs with a high base clock: my standing recommendation is to chose CPUs with base clock > 3.5 GHz.

Topic		Replies	Views
Discrepancy when profiling GPU memory utilization CUDA Programming and Performance	0	568	December 4, 2018
Monitoring GPU Utilization "Top" like utility for GPU CUDA Programming and Performance	8	6544	July 28, 2010
Tx1 inconsistent GPU utilization ratio for the same function loops CUDA Programming and Performance	0	463	July 13, 2016
Low GPU performance CUDA Programming and Performance	1	692	March 6, 2024
Task Manager GPU Utilization Graph No Longer Working (460.15) CUDA on Windows Subsystem for Linux	1	1120	September 26, 2020
Slow training of neural networks on GPU CUDA Programming and Performance	17	4285	April 21, 2021
GPU functioning only at 16% with CUDA and cuDNN installed (Geforce GTX 750 Ti) CUDA Programming and Performance	5	2734	May 26, 2018
PyTorch utilize CPU instead of GPU CUDA on Windows Subsystem for Linux	5	3102	November 25, 2020
showing gpu utlization per process CUDA Programming and Performance	5	2296	October 12, 2018
Fine tune gtx 2080 on Ubuntu? Deep Learning (Training & Inference)	0	331	October 17, 2020

Training a gan on the RTX 30370 with only ~5% utilization

Related topics