The definition of sm_efficiency in Profiler User’s Guide is: The percentage of time at least one warp is active on a multiprocessor averaged over all multiprocessors on the GPU. [url]http://docs.nvidia.com/cuda/profiler-users-guide/#metrics-reference-2x[/url].
My question is relevant to the bold part in the above definition. I think the sm_efficiency metric should defined as eligible warp instead of active warp, since eligible warp means the warp is eligible to issue an instruction, whereas active warp means it is allocated to an SM. The active warps may be stalled.
Somebody please help me confirm if my understanding is correct.
references: [url]cuda - Trying to understand nvprof metrics, sm_efficiency and warp_execution_efficiency zero - Stack Overflow
[url]https://devtalk.nvidia.com/default/topic/680398/cuda-programming-and-performance/resident-warp-vs-active-warp/post/4106481/#4106481[/url]