CUDA Profiler Question about CUDA profiler

Can someone suggest if gld_coherent statistics reported by the profiler tool is per an individual multiprocessor or across all multiprocessors combined?

CUDA_Profiler_2.1.txt says this:

“In addition, the profiler can only target one of the multiprocessors in the
GPU, so the counter values will not correspond to the total number of warps
launched for a particular kernel. For this reason, when using the performance
counter options in the profiler the user should always launch enough threads
blocks to ensure that the target multiprocessor is given a consistent
percentage of the total work. In practice, it is best to launch at least around
100 blocks for consistent results.”