I had a few questions about the sm_efficiency metric. My understanding is that the sm_efficiency metric reports the percentage of time where there is at least one active warp on an SM and that “active warps” include warps that are stalled.
-
Is the sm_efficiency a percentage of the kernel’s total runtime?
-
Since sm_efficiency is not 100% in every case, what causes the period of time where no SMs have any active warps? In other words, what is happening on the GPU when the kernel is still “running”, but no SMs have any active warps?