Hi,
I was looking for precise theoretical answers to the following memory access patterns in connection with non-atomic instructions only:
-
What happens when all or few warps from the same block are trying to write to the same global memory location? What if the write is surrounded by __syncthreads?
-
- What happens when all or few warps from the same block are trying to write to the same shared memory location? What if the write is surrounded by __syncthreads?
-
- What happens when threads from different blocks are trying to write to the same global memory location?
I want to know if all writes succeed giving the correct output sooner or later? Are they all serialized? Etc. I thought I kind of knew the answers but certain behaviour in my code is really baffling me. Will appreciate if I can get clarifications on the above.
Thanks & regards,
Aditi