Atomic ? how does it guarantee correctness ?

Atomic ? how does it guarantee correctness ? and memory consistency?
First, sorry for this newbie and out-of-topic question…

As far as I know, atomic instruction makes sure that when it is executed, no other threads can modify that data (just like a critical section).
Am I correct ?

But how is this implemented in HW ? how does hardware guarantee this ? (does hardware generate three micro instructions internally ? unlock, modify, and lock ?)
what is the difference between just using mutex vs. atomic instruction ?
is only difference the number of instructions ?? (1 instruction for atomic, multiple insts for normal mutex…)

perhaps, this is not specifically related to GPU architecture, but if anyone can help in high-level, it would be great, too.
Thanks

NVIDIA doesn’t discuss these details AFAIK. But I would guess that the atomic operations are guaranteed in the memory controller. atomicadd is a single instruction in ptx, separate from read and write. As are all the other atomic ops.