I want to know means about CUPTI metrics in details.

myabcc17 · February 1, 2019, 4:13am

I would like to know what these metrics mean in detail. I’ve read the document, but I’m not sure what it means. May I have your details?

All of the items below are related to warp.

achieved_occupancy
inst_per_warp
sm_efficiency
stall_not_selected
warp_execution_efficiency
eligible_warps_per_cycle

Greg · February 4, 2019, 10:38pm

I recommend reviewing the Nsight VSE CUDA Profiler documentation on

Issue Efficiency
https://docs.nvidia.com/nsight-visual-studio-edition/Nsight_Visual_Studio_Edition_User_Guide.htm#Analysis/Report/CudaExperiments/KernelLevel/IssueEfficiency.htm

Achieved Occupancy
https://docs.nvidia.com/nsight-visual-studio-edition/Nsight_Visual_Studio_Edition_User_Guide.htm#Analysis/Report/CudaExperiments/KernelLevel/AchievedOccupancy.htm

These metrics are all related to the Streaming Multiprocessor (SM).

achieved_occupancy is the ratio of active warps (warps resident on SM actively being scheduled) to the maximum number of warps the SM can support. The higher the ooccupancy the more likely the warp scheduler can hide latency. The higher the occupancy likely the lower the number of resources (e.g. registers/thread) per warp.

inst_per_warp Is the average number of instructions executed per warp.

sm_efficiency Is the ratio of cycles that a SM had at least 1 active warp to the total number of cycles executed in the measurement. sm_activity is a more accurate name. If sm_efficiency is less than 90% then either there was insufficient work launched (increase thread blocks per launch) or the kernel has a bad tail effect (subset or blocks/warps run longer than the rest). Fix this first.

stall_not_selected Is the percentage of active warps that were ready to issue an instruction but the warp scheduler picked a higher priority warp. Is this number is high then part or all of the kernel has sufficient occupancy (active_warps) to hide instruction latency. If this number is really high then it may be worth decreasing occupancy by trying to use more registers/thread. Each cycle each warp scheduler can pick one eligible warp (active warp that is not stalled) to issue instructions. If there are multiple eligible warps then 1 warp will report the reason selected and the other eligible warps will report not selected.

warp_execution_efficiency is the ratio of average active threads per warp per instruction executed to the maximum number of threads per instruction (warp_size = 32). If this is less than 100% then the kernel has either thread divergence or the kernel was not launched with a multiple of 32 threads per block.

eligible_warps_per_cycle is the number of active warps per cycle that are not stalled. I believe CUPTI measures this at the SM level. In order to issue at maximum rate the SM warp schedulers each have to have 1 eligible warp so for most architectures this number has to be at least 4 so that each warp scheduler has 1.

Topic		Replies	Views
Difference between eligible_warps_per_cycle, sm_efficiency, and achieved_occupancy of nvprof metrics? CUDA Programming and Performance	0	778	May 6, 2018
Nsight VS CUPTI Nsight Visual Studio Edition	2	2295	January 9, 2014
Visual Profiler says my occupancy is 221% CUDA Programming and Performance	4	1843	April 14, 2013
Question about NVIDIA Visual Profiler's occupancy results CUDA Programming and Performance	2	1028	May 29, 2019
bug in CUPTI - occupancy on Kepler is 2x off CUDA Programming and Performance	2	992	April 2, 2013
question about calculating occupancy CUDA Programming and Performance	2	6580	April 7, 2010
What does Achieved Active Warps Per SM in Nsight means and how to calculate it? Nsight Compute cuda	3	1380	October 12, 2021
Sm and we efficiency Visual Profiler and nvprof	6	1709	August 26, 2021
nvvp: count cycles where no warp is runnable not possible currently, but would be really helpful CUDA Programming and Performance	2	1108	June 4, 2013
Theoretical and Achieved Occupancy metrics Nsight Compute	5	495	June 6, 2025

I want to know means about CUPTI metrics in details.

Related topics