Questions about the sm_efficiency metric

I had a few questions about the sm_efficiency metric. My understanding is that the sm_efficiency metric reports the percentage of time where there is at least one active warp on an SM and that “active warps” include warps that are stalled.

  1. Is the sm_efficiency a percentage of the kernel’s total runtime?

  2. Since sm_efficiency is not 100% in every case, what causes the period of time where no SMs have any active warps? In other words, what is happening on the GPU when the kernel is still “running”, but no SMs have any active warps?

I am a bit puzzled. In my thinking: No active warp on any SM = all SMs are idle = the kernel is done

While a kernel is winding down, I would expect the number of idle SMs to increase until all SMs are idle. Due to these “edge effects” the sm_efficiency could never be exactly 100%, although it could probably be arbitrarily close.

Even when we consider concurrent kernels, there is likely a little “bubble” when an SM very briefly goes idle before having new warps for the next kernel scheduled onto it.