What does Achieved Active Warps Per SM in Nsight means and how to calculate it?

Section: Occupancy


Block Limit SM block 32
Block Limit Registers block 4
Block Limit Shared Mem block inf
Block Limit Warps block 2
Achieved Active Warps Per SM warp 48.50
Achieved Occupancy % 75.78
Theoretical Active Warps per SM warp/cycle 64
Theoretical Occupancy % 100


There is a parameters which is Achieved Active Warps Per SM, And I want to know what does it mean? And which parameters in kernel will affect this one, like block size, grid size and so on. The last one is that can I calculate it without running the kernel, just use the information of code and launch config.

can anyone help me?

You cannot statically compute the Achieved Active Warps or Achieved Occupancy without running the kernel. The Theoretical Active Warps/Occupancy metrics are available using only the kernel launch parameters, GPU and CUDA cache configuration settings and can be computed statically using the https://docs.nvidia.com/cuda/cuda-occupancy-calculator/index.html.

The achieved metrics depend on the actual workload (i.e. your code). It shows the cumulative number of warps in flight on average over the runtime of the kernel, as suggested by the underlying metric name (sm__warps_active.avg.per_cycle_active).

Also, note this part of the description of this section by Nsight Compute: Large discrepancies between the theoretical and the achieved occupancy during execution typically indicates highly imbalanced workloads.