shared_efficiency = Ratio of requested shared memory throughput to required shared memory throughput expressed as percentage
The numerator is collected using a shader patch to determine number of bytes requested. This takes into account if threads are active (and I believe predicated true).
The denominator is the total number of cycles data is returned from shared memory x width of the interface.
On Kepler architecture the shared memory return path width is 256B which can only be achieved if the kernel is run in 8B bank mode and 64-bit or greater accesses are executed. If the kernel is executed in 4B bank mode the maximum efficiency may be limited to 50%. On Maxwell - Turing architectures the bank mode is fixed and all instruction widths should be able to achieve full efficiency.
I wonder if the topic starter came to conclusion 5 years ago. I am profiling kernels on an old GP104 with nvprof, and have exactly these symptoms: sh. utilization if high (9-10 points), shared memory efficiency is low (30%), there are no bank conflicts (shared_st_bank_conflicts=shared_ld_bank_conflicts=0). I do not understand why shared memory efficiency is not 100%, as @Greg said. If it is because I have many broadcast/multicast accesses? Thank you.
efficiency probably means something like bytes used/bytes requested.
An access pattern could be conjectured that had relatively low efficiency with no bank conflicts. For example only 1 thread in a warp requests a value. Or each thread in a warp requests only a byte.
My suggestion would be to investigate the actual access pattern, and then the reason for low efficiency is likely to become clear. Also, for profiler-specific questions, we have dedicated forums for those.