Question on shared memory

I see a discussion about shared memory efficiency in [1].
My question is related to the definition. The ratio of REQUESTED and REQUIRED of something. How that is guaranteed to be less than 1?
I expect that requesting something may not always be required. For example, you ma execute an instruction but that is not going to be retired due to missprediciton. Therefore, REQUEST can be larger than REQUIRED.

For memory throughput, I guess it is better to say the ratio of achieved throughput by peak throughput.
Can someone clarify that?

[1] https://devtalk.nvidia.com/default/topic/1039557/cuda-programming-and-performance/what-does-the-quot-shared_efficiency-quot-really-mean-/

Whatever is requested is always required.

The requested shared memory throughput (the numerator) is basically the number of bytes requested. The only thing that counts for this definition is what does your code need. That’s why a special shader patch is written to create this.

The required shared memory throughput is the amount of shared memory traffic that actually occurred to service the request.

The required shared throughput according to this definition cannot be lower than what was requested.

So, do you mean that requests+transactions_for_replays=required?
Otherwise, when you are saying that “requested is always required”, in what circumstances something is required by not requested? That will make request<required.

Suppose I request 1 byte, in one thread. What is requested is 1 byte. What is required is a warp-wide transaction.

Suppose I request a float quantity per thread, warp-wide. What is requested is 4x32 = 128 bytes. Let’s further suppose these float quantities are all located in the same shared memory bank. What is required (due to bank conflicts) is 32 transactions, warp-wide. That was the actual shared memory traffic required to service the request.

OK I understand that. However, that raises a question about transaction size.
If 32 threads want to read one byte each and they are in the same bank, then the transaction is is 32 bytes as far as I know.
If on the other hand each thread requests 2 bytes and they don’t conflict, then the transaction size is 64 bytes.

However, what happens when requests are unbalanced or conflicting as you said?
In the first case, only one thread requests one bytes. So, how the transaction size is set? minimum 32 bytes?

In the second case, 32 transactions are needed while in each transaction only one thread is serviced and that thread needs 4 bytes. again 32 byte transactions?

Thanks for the explanation.