atomicCAS

jasno · June 30, 2011, 3:59pm

So, I thought that it wasn’t possible to communicate between threads in different blocks
but then I discover atomic functions (particularly atomicCAS()) which the docs just say

“The operation is atomic in the sense that it is guaranteed to be performed without
interference from other threads.”

Which suggests that it operates on ALL threads irrespective of blocks, is that correct?

Doesn’t that mean you can communicate between between threads in different blocks ?

–
jason

tmurray · June 30, 2011, 5:14pm

Yes.

Only if the execution of one block does not depend on the execution of another block. If there exists a scheduling where your kernel may deadlock, assume the hardware will do that.

akavo · June 30, 2011, 8:32pm

Hello tmurray,

Would it be safe to say that atomicCAS would work to implement a correct lock that works with concurrent kernels.

I remember reading that, a deadlock can occur if the number of blocks is greater than the number of stream processors.

So my other question is, if we’re launching concurrent multiple concurrent kernels that make use of atomicCAS and operate on shared data, would the SUM of the # of blocks from all kernels have to be less than the number of processors?

Thanks.

tmurray · June 30, 2011, 9:18pm

That is generally completely unsafe.

jasno · July 1, 2011, 3:28pm

Thanks for the clarification. So to be clear, if using the atomicCAS() on a device int as an atomic lock,
ALL threads in ALL blocks respect the atomic nature of the call and will block if waiting to access the lock?

Another, but related question, if I declare

shared int count;
int X;

within a kernel and then increment it using

atomicAdd(&count,X);

it will only cause threads WITHIN a block to serialise (w.r.t. each other) to perform the atomicAdd() since
the variable being incremented is only visible to members of a block ?

–
jason

tmurray · July 1, 2011, 10:36pm

I think you’re missing my point. Global locking using atomicCAS is unsafe if a thread that fails to acquire the lock cannot complete.

akavo · July 2, 2011, 3:41am

Wouldn’t that be the fault of the algorithm and simply mean that the locking algorithm is not correct for the GPU architecture?

What if we use a lock under strict assumptions, such as:

-only one thread in a warp can lock / unlock

-no use of syncthreads not even threadfence

-locking/unlocking done in the same order etc;

So my question is can we assume that such usage is safe? If not why?

Is there more info on scheduling on the GPU ond it’s effects on the situation?

The reason I’m interested in this , is because I’m already using such a mutex in a Container class managed by the device and it’s working quite reliably. The lock is fine grained (on a per Node basis) and contention is low and it doesn’t perform half-bad either. Knowing more about the issues above would help me understand if I can extend the algorithms to perform correctly using asynchronous calls.

Thank you

tmurray · July 2, 2011, 6:25am

You’re making assumptions about how the scheduler on the SM works; you cannot make any such assumption.

jasno · July 4, 2011, 9:19am

So are you saying that only one thread in a warp should attempt to get a lock ?

–

jason

Topic		Replies	Views
atomicCAS for mutiple blocks & mutiple threads - CUDA 3.2 - Fedora 10 CUDA Programming and Performance	7	2602	April 25, 2011
Try to use lock and unlock in CUDA CUDA Programming and Performance	1	19774	June 14, 2017
atomicCAS issue (possible deadlock) CUDA Programming and Performance	5	3327	October 26, 2011
atomiccas usage Legacy PGI Compilers	2	3733	December 25, 2014
atomicCAS() doesn't work! CUDA Programming and Performance	4	9236	July 22, 2010
questions about using atomicCAS as a lock CUDA Programming and Performance	0	1367	November 10, 2011
Confusing results while using atomicCAS() on shared variables CUDA Programming and Performance	1	976	August 5, 2009
Implementing mutual exclusion lock using atomicCAS() CUDA Programming and Performance	2	2429	August 5, 2009
atomic locks CUDA Programming and Performance	15	13106	January 27, 2012
Problem with lock using atomicCAS CUDA Programming and Performance	3	3645	July 19, 2014

atomicCAS

Related topics