Based on the definition sm__cycles_active.sum is the sum of all cycles across SMs and sm__inst_executed.sum is the sum all instructions across all SMs. So it that correct to to divide them to get the total IPC? However, SMs are running in parallel. so, I wonder if I can use sm__cycles_active.avg (for one SM) for calculating the total IPC.
Any note for that?
Your understanding is correct.
These metrics are already supported:
- The value will be [0 - sm__inst_executed.peak_sustained]
- The value will be [0 - (device__attribute_multiprocessor_count x sm__inst_executed.peak_sustained)]
The average is useful to look at efficiency.
The sum is useful for comparing throughput of two different GPUs (preferably of same architecture)
Thanks fo that. The question is, should I use sm_inst.sum and sm_cycles.avg for the device ipc or sm_inst.avg and sm_cycle.avg? From your answer I guess you are saying sm__inst_executed.sum.per_cycle_active is the device ipc. I would like to verify that manually for a check.