Tell me about atomics with mapped/zero-copy host memory

According to this post ([post=“521996”][/post]) atomic operations do work on mapped host memory. I can also perform atomic operations from the cpu using e.g. InterlockedAdd() or atomic_add().

Can anyone shed any light on what happens when multiple devices or a device and the host are performing atomic operations on the same memory location? I.e. does the gpu lock the pci bus? Is there some software in the driver that would synchronize memory accesses somehow?

I looked through the beta version of the programming guide and searched for forum posts but couldn’t find anything on this. Any clarification is greatly appreciated.

Edit: Fixed link

As far as I know, atomics are only guaranteed within a device. I haven’t played with it, though…