How to calculate the number of DADD instructions of a kernel on GPU?

I need to calculate the number of DADD instructions executed on device AI100. I guess to use the expression:

DADD = smsp__sass_thread_inst_executed_op_dadd_pred_on.sum.per_cycle_elapsed * smsp__cycles_elapsed.avg.per_second * kernel_duration * smsp_on_SM * SMs_count

where, the runtime and HW values:

duration = 1.58 mseconds * 1000 # convert to nseconds
smsp_on_SM = 4

But, I’m getting too big value.
Where I’m wrong?

I think this is by now answered in your other post Difference sm__cycles_elapsed/smsp__cycles_elapsed and sm__inst_executed/smsp__inst_executed? - #4 by felix_dt

For completeness, you don’t have to use the .per_cycle_elapsed sub-metric if you really want the pure sum. In this case, using the .sum sub-metric smsp__sass_thread_inst_executed_op_dadd_pred_on.sum is sufficient, and no further calculations are necessary.

Similarly, instead of computing smsp__cycles_elapsed.avg.per_second * kernel_duration, you can use smsp__cycles_elapsed.max (or better gpc__cycles_elapsed.max) directly, to get the number of cycles of the longest-executing unit instance, which thereby determines the number of cycles for the entire kernel.

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.