Question about NVIDIA Visual Profiler's occupancy results

evabasis98 · May 28, 2019, 4:12pm

Hi!

I am trying to extract information about resources of a CUDA application I have developed.

I am using nvvp (Nvidia visual profiler).
In Unguided Analysis-> Kernel latency, there are resources and occupancy information that I am intersted in.

However, the results are not that clear to me.

For example here is a picture of the nvvp’s result about occupancy:

I have the following equestions:
1)what is the theritical value and why it is different from achieved?
2)Why in active blocks and active threads there is not achieved value and there is only theoritical?
How is it possible to find the achieved value of those values?
3)Why does Active Warps achieved value is 62.23? This is the actual number of actuve warps right? How is it possible to be 61.23 and not an integer number?

Thank you

saulocpp · May 28, 2019, 9:25pm

Try searching for “cuda profiler metrics”. The results you get most likely answer these questions plus a lot more that all of us will eventually have.

Greg · May 29, 2019, 1:53am

achieved occupancy = active_warps_sm / active_cycles_sm / MAX_WARPS_PER_SM * 100.0

theoretical occupancy is calculated using the occupancy calculator.

active_warps_sm increments by [0 - (SM_COUNT * MAX_WARPS_PER_SM)] per cycle.
active_cycles_sm increments by [0 - SM_COUNT] per cycle.

1)what is the theoretical value and why it is different from achieved?

The achieved value is measured on the hardware. The theoretical value is calculated based upon the launch information. Thread blocks and warps take cycles to launch so achieved < theoretical. If a warp exits early then it will not contribute for as many cycles.

2)Why in active blocks and active threads there is not achieved value and there is only theoretical?
How is it possible to find the achieved value of those values?

The CUDA profilers do not measure active blocks or active threads. It could be useful to measure active blocks. Comparing the average thread block duration to the average warp latency for a kernel would indicate if there exists a tail effect.

The CUDA profilers do not measure active threads as this is not a very useful number. The profilers do measure the average number of threads executed per instruction executed and the average number of active predicated true threads executed per instruction executed. These metrics are easy to take action on. This information is presented per SASS instruction, per high level source code, and per kernel in all three profilers (Nsight VSE, Nsight Compute, and NVVP/nvprof),

3)Why does Active Warps achieved value is 62.23? This is the actual number of active warps right? How is it possible to be 61.23 and not an integer number?

The formula results in a whole number divided by a whole number results in a factional number of the numerator is not a multiple of the denominator.

Topic		Replies	Views
Visual Profiler says my occupancy is 221% CUDA Programming and Performance	4	1857	April 14, 2013
question about calculating occupancy CUDA Programming and Performance	2	6592	April 7, 2010
Achieved Occupancy vs Theoretical CUDA Programming and Performance	6	5468	September 20, 2011
Theoretical and Achieved Occupancy metrics Nsight Compute	5	549	June 6, 2025
CUDA Visual Profiler Vista CUDA Programming and Performance	2	4191	September 11, 2009
bug in CUPTI - occupancy on Kepler is 2x off CUDA Programming and Performance	2	999	April 2, 2013
What does Achieved Active Warps Per SM in Nsight means and how to calculate it? Nsight Compute cuda	3	1394	October 12, 2021
about occupancy CUDA Programming and Performance	3	1732	December 16, 2009
nvvp: count cycles where no warp is runnable not possible currently, but would be really helpful CUDA Programming and Performance	2	1112	June 4, 2013
Occupancy calculator v.s. profiler CUDA Programming and Performance	0	2748	January 19, 2011

Question about NVIDIA Visual Profiler's occupancy results

Related topics