Does anyone know if a coalesced read or write is atomic? It seems like it should be, since it reduces to a single memory transaction, but can anyone confirm this? In addition, will this change between hardware versions?
Based on my findings here, I’m quite convinced they are. The address space is divided into 256 byte words (the maximum memory transaction size when all 16 threads read/write a 16byte value), with each word mapped to a single memory bank. The banks are 8 bytes wide, so 8 byte reads/writes would definitely be atomic. Based on my understanding of SDRAM, I’m guessing they use burst mode for larger sizes (e.g. 16 floats => burst length 8). Burst mode isn’t interrupted, hence it would be atomic.
How are you trying to use atomic memory ops within a warp? Is there a producer/consumer relationship within warp threads?