Critical Section with 1.0 Compute Capability Is there a way to implement critical section mutex prim

Hello everybody
I am using a " Tesla C870 " with compute capability 1.0
I need different threads to update some data location in atomic way, something like using mutex. I know that there are some Atomic operations on Device with compute capability 1.1
Does anybody have an idea how to implement a locking mechanism on Compute Capability 1.0

Thanks

What you are asking for is difficult to get right, even with atomics. I believe working implementations exist using atomics, but I don’t know if anyone has gotten it right without them.

Some things to consider:

  1. Threads within a warp can deadlock each other, due to the way branching behaves. Many algorithms presume if at least one thread can proceed, then it will, and eventually release the lock. More here.
  2. A straightforward implementation of the bakery algorithm assumes stricter memory consistency than cuda devices have. This might be fixable with judicious use of volatile and __threadfence(). Or it might not.