shared_efficiency = Ratio of requested shared memory throughput to required shared memory throughput expressed as percentage
The numerator is collected using a shader patch to determine number of bytes requested. This takes into account if threads are active (and I believe predicated true).
The denominator is the total number of cycles data is returned from shared memory x width of the interface.
On Kepler architecture the shared memory return path width is 256B which can only be achieved if the kernel is run in 8B bank mode and 64-bit or greater accesses are executed. If the kernel is executed in 4B bank mode the maximum efficiency may be limited to 50%. On Maxwell - Turing architectures the bank mode is fixed and all instruction widths should be able to achieve full efficiency.