questions about using atomicCAS as a lock

wfchiang · November 10, 2011, 9:43pm

Hello all,

My name is Wei-Fan. I am a new CUDA programmer learner.
I am now practicing using atomicCAS as a lock of global memory.
But there are two cases I just can’t find out the problems…

case 1:

global adder (int *buff) {
int idx = threadIdx.x;
int lock = -2;
int old;
do {
old = buff[idx];
if (old == lock) continue;
} while (old != atomicCAS(buff[idx], old, lock));
buff[idx] = old + 1;
}

In this case, I have 1 dimension block and 1 dimension grid. Each block has 256 threads. Each grid have 64 blocks.
Therefore, ideally, each element in buff should be 64. However, the result shows that there are some races so not every element is 64.

case 2:

global adder (int *buff) {
int idx = threadIdx.x;
int lock = -2;
int old;
do {
do {
old = buff[idx];
} while (old == lock);
} while (old != atomicCAS(buff[idx], old, lock));
buff[idx] = old + 1;
}

The case 2 is basically just a modified version of case 1. But this one works worst!!
It seems that there is a deadlock or something else in this code that I wait a long time but the program wasn’t terminated… Why?

I know that I can implement “adder” by atomicAdd or atomicInc. Or even

do {
old = buff[idx];
} while (old != atomicCAS(buff[ids], old, (old+1)));

(I did this one. It works good.)

I just wonder that “what I did wrong in case 1 and case 2?”

Thanks.

Regards,

Wei-Fan

Topic		Replies	Views
atomicCAS() doesn't work! CUDA Programming and Performance	4	9213	July 22, 2010
atomicCAS issue (possible deadlock) CUDA Programming and Performance	5	3305	October 26, 2011
atomiccas usage Legacy PGI Compilers	2	3723	December 25, 2014
atomicCAS for mutiple blocks & mutiple threads - CUDA 3.2 - Fedora 10 CUDA Programming and Performance	7	2567	April 25, 2011
atomic locks CUDA Programming and Performance	15	13032	January 27, 2012
Problem with lock using atomicCAS CUDA Programming and Performance	3	3612	July 19, 2014
Implementing mutual exclusion lock using atomicCAS() CUDA Programming and Performance	2	2407	August 5, 2009
atomicCAS CUDA Programming and Performance	8	4038	July 4, 2011
Confusing results while using atomicCAS() on shared variables CUDA Programming and Performance	1	961	August 5, 2009
A weird behaviour of atomicCAS() CUDA Programming and Performance	2	1449	June 15, 2012

questions about using atomicCAS as a lock

case 1:

global adder (int *buff) { int idx = threadIdx.x; int lock = -2; int old; do { old = buff[idx]; if (old == lock) continue; } while (old != atomicCAS(buff[idx], old, lock)); buff[idx] = old + 1; }

case 2:

global adder (int *buff) { int idx = threadIdx.x; int lock = -2; int old; do { do { old = buff[idx]; } while (old == lock); } while (old != atomicCAS(buff[idx], old, lock)); buff[idx] = old + 1; }

I know that I can implement “adder” by atomicAdd or atomicInc. Or even

do { old = buff[idx]; } while (old != atomicCAS(buff[ids], old, (old+1)));

Related topics

global adder (int *buff) {
int idx = threadIdx.x;
int lock = -2;
int old;
do {
old = buff[idx];
if (old == lock) continue;
} while (old != atomicCAS(buff[idx], old, lock));
buff[idx] = old + 1;
}

global adder (int *buff) {
int idx = threadIdx.x;
int lock = -2;
int old;
do {
do {
old = buff[idx];
} while (old == lock);
} while (old != atomicCAS(buff[idx], old, lock));
buff[idx] = old + 1;
}

do {
old = buff[idx];
} while (old != atomicCAS(buff[ids], old, (old+1)));