gld efficiency and throughput

I can not find the relation between global load throughput and efficiency. According to the document,

gld_throughput Global memory load throughput
gld_efficiency Ratio of requested global memory load throughput to required global memory load throughput expressed as percentage.

As you can see in the picture, a kernel with 31.3GB/s throughput has 100% efficiency, while another kernel with 34.7 GB/s has 88.9% efficiency. Another one with 51 GB/s has 57% efficiency.

How that can be explained?

100% efficiency generally means the global loads (or global stores in the case of gst_efficiency) are fully coalesced.

Less than 100% efficiency means that some loads (or stores) are not fully coalesced.

This can be considered and understood independent of any notion of throughput or how the global loads or stores are actually occurring. However, to get to a mathematical or measureable definition, you have to start somewhere.

Throughput will be driven by the actual needs of your program. If your program or kernel needs less data, or needs data less often, the throughput will be lower, compared to a kernel that needs more data or needs data more often.