As per my understanding of nvprof guide, instruction per warp or inst_per_warp is the number of instruction per thread * 32 (assuming no branches and every thread follow same path). That is why I am getting results of order 10 to power 4. Can anyone tell me is my understanding correct because there is no descriptive documentation available regarding the various device-query instructions.
What is the significance of inst_per_warp ?
==32315== NVPROF is profiling process 32315, command: ./a.out
==32315== Profiling application: ./a.out
==32315== Profiling result:
==32315== Metric result:
Invocations Metric Name Metric Description Min Max Avg
Device "Tesla K20m (0)"
Kernel: intLatKernel(__int64*, __int64*)
1 inst_per_warp Instructions per warp 1.3020e+04 1.3020e+04 1.3020e+04
Thanks