Shared Memory and Read After Write

Suppose thread A writes something to shared memory, and thread B wants to read it. Now, if thread A and B are in different warps, it’s obvious that a __syncthreads() is needed, since who knows where in the code each warp will be relative to the other. The question then is assuming that A and B are in the same warp - can threads all in a single warp communicate through shared memory without needing __syncthreads()?

Let’s make it even simpler and assume no possibility of divergence between the writing and the reading, thus avoiding any strange reordering of if/else blocks where each side is executed. What about in this case?

Basically, I want each warp to share data internally without dragging the entire block through a sync.

What you refer to is something we internally call warp-synchronous programming. Yes, it’s possible. It’s also quite a bit faster (I’ve heard 20% tossed around) in some cases.

See section 5.4 of the programming guide.

N.