need clarity in definition of inst_per_warp

As per my understanding of nvprof guide, instruction per warp or inst_per_warp is the number of instruction per thread * 32 (assuming no branches and every thread follow same path). That is why I am getting results of order 10 to power 4. Can anyone tell me is my understanding correct because there is no descriptive documentation available regarding the various device-query instructions.
What is the significance of inst_per_warp ?

==32315== NVPROF is profiling process 32315, command: ./a.out
==32315== Profiling application: ./a.out
==32315== Profiling result:
==32315== Metric result:
Invocations                               Metric Name                        Metric Description         Min         Max         Avg
Device "Tesla K20m (0)"
	Kernel: intLatKernel(__int64*, __int64*)
          1                             inst_per_warp                     Instructions per warp  1.3020e+04  1.3020e+04  1.3020e+04

Thanks

According to CUPTI ([url]CUPTI :: CUDA Toolkit Documentation), the definition of inst_per_warp is average number of instructions executed by each warp.