For example, the max inst_integer is 697,444 for 1459 invocations. So, for one invocation it will be 478.
If you see nvbit, the int class is 1211. Then multiplying that by 32 we get 38,752 which is far beyond 478.
I think these two tools assumes different things. But it is not clear.
Maybe IMAD.IADD is considered as int but IMAD.MOV is not int. I don’t know though…
I didn’t find these things in the documents.
I’m not sure you understand how to use nvprof. That’s not how I interpret this output from nvprof:
==11430== Metric result:
Invocations Metric Name Metric Description Min Max Avg
Device "TITAN V (0)"
1459 inst_integer Integer Instructions 32372 697444 261807
That says “across all invocations of this kernel, the maximum count for that metric (for a particular invocation) was 697444, the minimum count for that metric (for a particular invocation) was 32372, and the average (which may not have been achieved by any particular invocation) was 261807”
If the nvbit measurement was for a single invocation, and the measurement number multiplied by 32 was 38,752, that is entirely plausible and within the range for that metric reported by nvprof.
Print individual kernel invocations (including CUDA memcpy's/memset's)
and sort them in chronological order. In event/metric profiling
mode, show events/metrics for each kernel invocation.
and since I didn’t use that option, I guessed that I see an average over all invocations.
Thanks for clarifying that.