The GPU concurrentcy and how to monitor GPU utilization. The nvidia-smi tool always show two utilization 0 or 100%.

guoquan · June 1, 2017, 11:08am

Hi guys:

I ask for your help. My Cuda application executes four thread, each has its own cuda stream. The kernel execute duration is the same in four thread. When I reduce to one thread in my application, The kernel execute duration is the same as running four thread situation. That's the point I confused. Because when my cuda application execute in one thread, nvidia-smi show my GPU utilization is 100%. According to this situation, when my cuda application execute four thread, I assume the kernel execute duration should be increased, but the result is different I thought. what's the reason. By the way, the nvidia-smi tool always show two utilization 0 or 100%, except these two numbers, there is no more number like 10% or 20% and so on. I am confused about this?

please no hesitate to correct me if i make some mistakes.

Robert_Crovella · June 1, 2017, 11:12am

Read this:

https://stackoverflow.com/questions/16617796/gpu-utilization/16618010#16618010

and this:

https://stackoverflow.com/questions/40937894/nvidia-smi-volatile-gpu-utilization-explanation/40938696#40938696

Your observations are entirely plausible. For example, in your 4 thread scenario, if the kernel execution timeline looked like this:

|k1|k2|k3|k4|

and in the one thread scenario it looked like this:

|k1|k1|k1|k1|

or

|   k1      |

Any of those scenarios would show 100% utilization. You can probably get better understanding by comparing the output of visual profiler timeline in each case.

BulatZiganshin · June 1, 2017, 12:22pm

That’s the point I confused.

each kernel is a mass-parallel job. while this job is executed, it usually utilizes gpu by 100%. between jobs, utilization is 0%
GPU dispatcher usually runs each job to completion before starting a next job, so time of job execution doesn’t depend on the number of streams (cpu threads) simulataneously enqueuing new jobs
but of course, overal performance (throughput) of GPU is divided between those 4 streams, so each stream will complete 4x less jobs in a (big enough) fixed amount of time

overall, streams are useful for ensuring 100% gpu load. with a single stream, gpu should wait until full completion of the job before starting new one, since there may be data dependencies. it’s a so-called tail-effect, and it means less than 100% gpu utilization on the job tail (last grid blocks)

with multiple streams, jobs in other streams are independent on the current job, so next job can be started while the current job continues to execute its tail, and GPU may be always utilized by 100%. this is independent on using single or multiple CPU threads to handling streams, since CPU only enqueues jobs to GPU

Topic		Replies	Views
Measuring the GPU Occupancy of Multi-stream Workloads Technical Blog	0	296	April 19, 2024
How to tune the SM utilization (across the entire GPU) of a CUDA kernel? CUDA Programming and Performance cuda , kernel , ubuntu	4	1359	July 23, 2023
[SOLVED] Concurrent Kernel Execution CUDA Programming and Performance	7	6127	May 21, 2016
GPU utilization for CUDA CUDA Programming and Performance	1	807	October 31, 2018
per-process resource accounting CUDA Programming and Performance	2	2960	December 22, 2022
nvidia-smi shows last GPU K80 (out of 8) is always busy CUDA Setup and Installation	8	2394	December 18, 2017
Multiple CPU threads with multiple cudaStreams CUDA Programming and Performance	5	6491	July 23, 2015
How to run different executable files in parallel on GPU? CUDA Programming and Performance	4	1921	July 11, 2017
Questions on per-process GPU utilization System Management and Monitoring (NVML)	5	3210	October 16, 2023
Questions about nvidia-smi CUDA Programming and Performance	2	2157	February 23, 2011

The GPU concurrentcy and how to monitor GPU utilization. The nvidia-smi tool always show two utilization 0 or 100%.

Related topics