Doubt regarding definition of "inst_executed" metric - nvprof

Hello.

While running nvprof in Ubuntu 14.04 and recording inst_executed and inst_fp_32 metrics, I noticed that inst_fp_32 returns a much larger value than inst_executed. Isn’t inst_executed the total number of instructions including inst_fp_32 and other instructions?

If not, how do I record the total number of instructions executed in a process? My aim is to get the %age of FP32 instructions executed out of the total instructions executed in a process.

Data recorded:
inst_executed: Avg - 1014985216
inst_fp_32: Avg - 28200517632

Hi,

Thanks for raising your query regarding metric provided by nvprof.

Question: I noticed that inst_fp_32 returns a much larger value than inst_executed. Isn’t inst_executed the total number of instructions including inst_fp_32 and other instructions?
Answer: inst_fp_32 metric gives count value at thread level. Description: “Number of single-precision floating-point instructions executed by non-predicated threads (arithmetic, compare, etc.)” wheres as inst_executed gives counter value at warp level. Hence you are getting more value for inst_fp_32 than inst_executed. inst_executed includes all the instruction but it gives the counter value at warp level.

Question:My aim is to get the %age of FP32 instructions executed out of the total instructions executed in a process.
Answer: We have “not_predicated_off_thread_inst_executed” counter which gives value at thread level you can use this counter to get %age of FP32 instruction executed out of total instructions.