Atomic ? how does it guarantee correctness ? and memory consistency?
First, sorry for this newbie and out-of-topic question…
As far as I know, atomic instruction makes sure that when it is executed, no other threads can modify that data (just like a critical section).
Am I correct ?
But how is this implemented in HW ? how does hardware guarantee this ? (does hardware generate three micro instructions internally ? unlock, modify, and lock ?)
what is the difference between just using mutex vs. atomic instruction ?
is only difference the number of instructions ?? (1 instruction for atomic, multiple insts for normal mutex…)
perhaps, this is not specifically related to GPU architecture, but if anyone can help in high-level, it would be great, too.
Thanks