May I know why smsp__thread_inst_executed exists while there is no sm__thread_inst_executed metric? In other word, for sub-partition it is valid, but there is no such thing for sm?
The SMSP (sub-partition) executes instructions. The performance counter is collected at the SMSP level. For some counters the underlying hardware and/or metrics library will calculate a metric at both the SMSP and SM level. In the case of thread_inst_executed the current metrics are only available at the SMSP level.
For all GPUs since Kepler (with exception to GP100) there are 4 SMSP per SM.
SM version can be derived from SMSP as follows:
- sm__thread_inst_executed.avg == smsp__thread_inst_executed.avg * 4
- sm__thread_inst_executed.sum == smsp__thread_inst_executed.sum
.min/.max cannot be calculated at the SM level.
I guess you mean “available at the SMSP level”. Right? As I said only smsp__thread_inst_executed is valid.
Sorry. Fixed SM to SMSP in my original reply.
Is there a reason you need the value at the SM level?
No. Actually I wanted to know maybe thread_inst at SM level is meaningless since there is no metric for that OR we can find that by a calculation from smsp__thread_inst.
Thanks for your help.