nvprof: Question about the sm_efficiency metric

I had a few questions about the sm_efficiency metric. My understanding from the profiler documentation is that the sm_efficiency metric reports the percentage of time where there is at least one active warp on an SM and that “active warps” include warps that are stalled. Is this interpretation correct?

  1. Is the sm_efficiency a percentage of the kernel’s total runtime?

  2. Since sm_efficiency is not 100% in every case, what causes the period of time where no SMs have any active warps? In other words, what is happening on the GPU when the kernel is still “running”, but no SMs have any active warps?

Formual for sm_efficiency is (active_cycles / elapsed_cycles_sm) * 100. This both events can be profiled using “-e” option in nvprof.

  1. sm_efficiency basically tells for how much percentage of elapsed cycles on SM (elapsed_cycles_sm) there was any work happening on SM(active_cycles).

  2. There can be multiple reason for low sm_efficiency one of the reason is user might not have launched kernel with correct configuration to fully occupy the SM. For example if GPU has 10 Sms but launch configuration is such that only warps are launched on 1 SM. Then 9 SM will be idle, in that case you will get low sm_efficiency value.