In the NCU report file, all the attributes having units [inst/cycle] are corresponding to all the threads/SMs for a single thread/SMs. If they correspond to a single thread/SM please tell which attribute indicate the number of threads/SMs used.
The attribute is the dimensional unit. To determine the filter on what portion of the kernel you have to look at the unit and the rollup.
sm__inst_executed.sum total number of warp instructions executed by all warps on all SMs
sm__inst_executed.avg average number of warp instructions executed by an SM
smsp__inst_executed.sum total number of warp instructions executed by all SM sub-partitions
smsp__inst_executed.avg average number of warp instructions executed by an SM sub-partition
sm__inst_executed.sum == smsp__inst_executed.sum
sm__inst_executed.avg == 4 x smsp_inst_executed.avg
If you want to look at threads you can review
sm[sp]__thread_inst_executed[_pred_on].{avg, sum}
sm[sp]__sass_thread_inst_executed[_pred_on].{avg, sum}
indicates optional substring
{,} indicates a set of options
_pred_on only active predicated on threads are counted
If you want to look at the average warp
smsp__inst_executed.sum / smsp__warps_launched.sum
If you want to look at the average thread
smsp__thread_inst_executed.sum / smsp__threads_launched.sum
1 Like
This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.