As per my understanding of nvprof guide, instruction per warp or inst_per_warp is the number of instruction per thread * 32 (assuming no branches and every thread follow same path). That is why I am getting results of order 10 to power 4. Can anyone tell me is my understanding correct because there is no descriptive documentation available regarding the various device-query instructions.
What is the significance of inst_per_warp ?
==32315== NVPROF is profiling process 32315, command: ./a.out ==32315== Profiling application: ./a.out ==32315== Profiling result: ==32315== Metric result: Invocations Metric Name Metric Description Min Max Avg Device "Tesla K20m (0)" Kernel: intLatKernel(__int64*, __int64*) 1 inst_per_warp Instructions per warp 1.3020e+04 1.3020e+04 1.3020e+04