Cycles for all SMs

Hi
Based on the definition sm__cycles_active.sum is the sum of all cycles across SMs and sm__inst_executed.sum is the sum all instructions across all SMs. So it that correct to to divide them to get the total IPC? However, SMs are running in parallel. so, I wonder if I can use sm__cycles_active.avg (for one SM) for calculating the total IPC.
Any note for that?

Your understanding is correct.

These metrics are already supported:

sm__inst_executed.avg.per_cycle_active

  • The value will be [0 - sm__inst_executed.peak_sustained]

sm__inst_executed.sum.per_cycle_active

  • The value will be [0 - (device__attribute_multiprocessor_count x sm__inst_executed.peak_sustained)]

The average is useful to look at efficiency.
The sum is useful for comparing throughput of two different GPUs (preferably of same architecture)

Thanks fo that. The question is, should I use sm_inst.sum and sm_cycles.avg for the device ipc or sm_inst.avg and sm_cycle.avg? From your answer I guess you are saying sm__inst_executed.sum.per_cycle_active is the device ipc. I would like to verify that manually for a check.