“Global Load Efficiency” over 100% in Visual Profiler

I’ve asked this question in stackoverflow.com here:
http://stackoverflow.com/q/19650777/2386951
but no answer so far. I copy-paste it here:

I have a CUDA program in which threads of a block read elements of a long array in several iterations and memory accesses are almost fully coalesced. When I profile, Global Load Efficiency is over 100% (between 119% and 187% depending on the input). Description for Global Load Efficiency is “Ratio of global memory load throughput to required global memory load throughput.” Does it mean that I’m hitting L2 cache a lot and my memory accesses are benefiting from it?

My GPU is GeForce GTX 780 (Kepler architecture).

Global Load Efficiency and Global Store Efficiency describe how well the coalescing of global DRAM-accesses and (L2?)Cache-accesses works. If they’re 100 percent then you’ve got perfect coalescing. Since efficiencies above 100 percent don’t make any sense (you cannot be better than optimal) this has to be an error.

This error is caused by the Visual Profiler, which counts hardware events to calculate some abstract metrics. But the GPU doesn’t have the “correct” events to exactly calculate all those metrics, thus Visual Profiler has to estimate those metrics by using some complex formula and “wrong” events. There are some metrics which are just rough estimations and Global Load Efficiency and Global Store Efficiency are two of them. Thus if such an efficiency is bigger than 100 percent it is an estimation error. As far as I observed the Global Load Efficiency and Global Store Efficiency both increased above 100 percent in some of my register spilling kernels. That’s why i assume that the Visual-Profiler uses some events, which also may be caused by local memory accesses, to calculate those two efficiencies. Furthermore GPUs just use 32 Bit counters. Thus long running kernels tend to overflow those counters, which also causes the Visual Profiler to display wrong metrics.

P.S. In older versions of Visual Profiler’s documentation there were listings of those metrics’s formulas. In the newer versions they’ve been removed. If you look up this formula you will probably understand all of those metrics better.