behaviour of overlapping writes into shared memory from single warp (software attomic operations)

I’ve been working with the software atomic operations in shared memory appearing in the histogram (256 bin) example.
It says that when multiple threads in a warp write an int (32 bits) into the same place in shared memory, only one write gets executed and others are dropped.
I was wondering about the limits of this behavior, that is

  1. What happens with larger/smaller write sizes i.e
    (a) two threads writing bytes to different positions in the same 32bit word
    (B) two threads writing an int4 for example or a structure
  2. What happens with a partial overlap, i.e, one thread writing to bytes 0-3 and second thread writing to byte 2-5

is the behvior defined or can’t I count on what happens?



I did some tests here that found that shared memory writes were atomic in bytes… meaning that if you’re writing to byte X, other threads writing to bytes X-1 and X+1 won’t affect your byte writes. Multiple thread writes to the same byte will of course have races, and “one will succeed” with no ordering promises.

There’s two posts in that thread… one is some code showing the byte-wise writes don’t interfere, and a followup measuring speeds showing that byte writes were (very surprisingly) about 10% faster than word writes.

So for both of your questions, your byte-wise addressing should probably work OK.

Okay, 1 is defined. Byte stores work. Overlapping anything doesn’t work.