do i get this right. For latest version and cards of Cuda, if shared memory is declared as volatile, there is no need for atomics?
In other words, in the following example
volatile shared float memSlot;
memsLot = 3;
//thread 6 at the same time as thread
//will result in memslot to be guaranteed to be 5 after these concurrent operations without atomics?
No, atomics is something else. If two different threads do atomic increments, the increments will be completely distinct from each other(each thread “checks out” the variable and locks it exclusively.)
With volatile, the two threads may do simultaneous reads, and after that, writes of the incremented value, but only one of the writes(the “last one” so-to-speak) gets to modify the variable, the end result being just one increment.
Atomics must be supported at the hardware level. volatile, on the other hand, is a signal to the compiler that says, “spill the registers or other temporary cached copies immediately.”