Confirm that coalescence does not matter for __shared__ access?

correct. For maximum throughput to shared memory, the rule is, considering a warp-wide access, we want no more than one item per bank requested. It is not necessary that all addresses be contiguous. Shared memory generally also has the broadcast rule. That means that if there are multiple requests to the same bank, but they are also to the same location, this does not reduce efficiency. A particular location can be broadcast to multiple threads in a warp, per transaction, at no additional cost.

1 Like