With nvbit, I see the total number of miscellaneous instructions classified as described in the reference [1] is much less than nvprof.
Without COUNT_WARP_LEVEL=0
BAR.SYNC = 123816
NOP = 123816
S2R = 3216
With COUNT_WARP_LEVEL=0
BAR.SYNC = 3962112
NOP = 3962112
S2R = 102912
I counted these instructions based on the reference [1].
But the nvprof number is much larger than that.
==40910== Metric result:
Invocations Metric Name Metric Description Min Max Avg
Device "TITAN V (0)"
Kernel: gen_hists(unsigned long*, float*, float*, float*, int, int)
1 inst_misc Misc Instructions 1.5831e+11 1.5831e+11 1.5831e+11
I know that nvprof counts at thread level, but as you can see in nvbit, warp-level and threa-level values are really lower than nvprof.
Any idea about the type of instructions considered as MISC?