I’ve been working with the software atomic operations in shared memory appearing in the histogram (256 bin) example.
It says that when multiple threads in a warp write an int (32 bits) into the same place in shared memory, only one write gets executed and others are dropped.
I was wondering about the limits of this behavior, that is
- What happens with larger/smaller write sizes i.e
(a) two threads writing bytes to different positions in the same 32bit word
(B) two threads writing an int4 for example or a structure - What happens with a partial overlap, i.e, one thread writing to bytes 0-3 and second thread writing to byte 2-5
is the behvior defined or can’t I count on what happens?
Thanks