shared memory atomics vs volatile does volatile eliminate the need for atomics in shared mem?


do i get this right. For latest version and cards of Cuda, if shared memory is declared as volatile, there is no need for atomics?
In other words, in the following example

volatile shared float memSlot[256];

memsLot[7] = 3;

//thread 5


//thread 6 at the same time as thread

//will result in memslot[7] to be guaranteed to be 5 after these concurrent operations without atomics?


No, atomics is something else. If two different threads do atomic increments, the increments will be completely distinct from each other(each thread “checks out” the variable and locks it exclusively.)

With volatile, the two threads may do simultaneous reads, and after that, writes of the incremented value, but only one of the writes(the “last one” so-to-speak) gets to modify the variable, the end result being just one increment.

Atomics must be supported at the hardware level. volatile, on the other hand, is a signal to the compiler that says, “spill the registers or other temporary cached copies immediately.”