How miscellaneous instructions are counted?

With nvbit, I see the total number of miscellaneous instructions classified as described in the reference [1] is much less than nvprof.

Without COUNT_WARP_LEVEL=0

BAR.SYNC = 123816
NOP = 123816
S2R = 3216

With COUNT_WARP_LEVEL=0

BAR.SYNC = 3962112
NOP = 3962112
S2R = 102912

I counted these instructions based on the reference [1].

But the nvprof number is much larger than that.

==40910== Metric result:
Invocations                               Metric Name                        Metric Description         Min         Max         Avg
Device "TITAN V (0)"
    Kernel: gen_hists(unsigned long*, float*, float*, float*, int, int)
          1                                 inst_misc                         Misc Instructions  1.5831e+11  1.5831e+11  1.5831e+11

I know that nvprof counts at thread level, but as you can see in nvbit, warp-level and threa-level values are really lower than nvprof.

Any idea about the type of instructions considered as MISC?

[1] CUDA Binary Utilities :: CUDA Toolkit Documentation