Question about smsp and sm

May I know why smsp__thread_inst_executed exists while there is no sm__thread_inst_executed metric? In other word, for sub-partition it is valid, but there is no such thing for sm?

The SMSP (sub-partition) executes instructions. The performance counter is collected at the SMSP level. For some counters the underlying hardware and/or metrics library will calculate a metric at both the SMSP and SM level. In the case of thread_inst_executed the current metrics are only available at the SMSP level.

For all GPUs since Kepler (with exception to GP100) there are 4 SMSP per SM.

SM version can be derived from SMSP as follows:

  • sm__thread_inst_executed.avg == smsp__thread_inst_executed.avg * 4
  • sm__thread_inst_executed.sum == smsp__thread_inst_executed.sum

.min/.max cannot be calculated at the SM level.

I guess you mean “available at the SMSP level”. Right? As I said only smsp__thread_inst_executed is valid.

Sorry. Fixed SM to SMSP in my original reply.

Is there a reason you need the value at the SM level?

1 Like

No. Actually I wanted to know maybe thread_inst at SM level is meaningless since there is no metric for that OR we can find that by a calculation from smsp__thread_inst.
Thanks for your help.