How to calculate the number of DADD instructions of a kernel on GPU?

I think this is by now answered in your other post Difference sm__cycles_elapsed/smsp__cycles_elapsed and sm__inst_executed/smsp__inst_executed? - #4 by felix_dt

For completeness, you don’t have to use the .per_cycle_elapsed sub-metric if you really want the pure sum. In this case, using the .sum sub-metric smsp__sass_thread_inst_executed_op_dadd_pred_on.sum is sufficient, and no further calculations are necessary.

Similarly, instead of computing smsp__cycles_elapsed.avg.per_second * kernel_duration, you can use smsp__cycles_elapsed.max (or better gpc__cycles_elapsed.max) directly, to get the number of cycles of the longest-executing unit instance, which thereby determines the number of cycles for the entire kernel.

1 Like