Warp asynchronisity and coalesced r/w confusion

Hi everybody,

I’m confused about how exactly does warp asynchronism works after CC 7.0? I have always assumed warp synchronous code and behaviour and never had a problem with it. How can coalesced reads/writes happen to global memory if threads within a warp are not executed in lockstep? I don’t understand.

If you have this code for example:

global[threadId] = shared[threadId];

how does it work if threads are not synchronised?