I’m confused about how exactly does warp asynchronism works after CC 7.0? I have always assumed warp synchronous code and behaviour and never had a problem with it. How can coalesced reads/writes happen to global memory if threads within a warp are not executed in lockstep? I don’t understand.
If you have this code for example:
global[threadId] = shared[threadId];
how does it work if threads are not synchronised?