noob question about ATOMIC operations...

necropower · February 17, 2013, 11:20pm

Is there a way to create a customized atomic operation in cuda?

I am asking because i got a problem that i need to check this kernel code:

if (img[index] != clone[index])
atomicExch(cmp, index);

the main problem is that img, clone and cmp are device memories, HOWEVER, cmp is a SINGLE unity memory, so it can be seen as a shared memory, so, all threads have acesss to it, with the atomicExch i guarantee that once someone triggers it, no one will stop it, so far so good, the main problem is the if that can trigger race conditions… i would like to make the both the if and the atomicExch to be executed in a atomic way… anyone have any idea of how to do that? or if it is at least possible?

kbam · February 18, 2013, 12:07am

If you are trying to do what I think you are then how about making a flag in shared memory

if (img[index] != clone[index])
set flag

synchthreads
if ( flag is set )
…

NB it is OK for several or even all threads in the block to set the flag as the hardware will queue them
e.g. if you set the flag to tid then its value will be one of the tid’s but which one is undefined.
(NB tid is not a good value to use because one thread has tid of zero.)

Hope that this helps

SPWorley · February 18, 2013, 12:14am

I wrote a long reply here about avoiding such complex atomic operations, and how to use an inefficient lock token as a last-resort answer to rare atomic issues anyway.

But I replaced that now because as I was writing up the code, I noticed that your example doesn’t have a problem that atomics or locks would solve anyway.
Since you don’t use the value stored in cmp, the atomicExch is effectively just an assignment.
Your test condition does not test cmp itself, so there’s no need to lock or exchange anything.
So just write “if (img[index] != clone[index]) *cmp=index;”

Now the final value of *cmp is subject to a race condition… if multiple threads have set it, then ONE will have succeeded at the end, but it is undefined which one was that winner.
If you really need a consistent winner, then just use atomicMin(cmp, index) or atomicMax(cmp, index), and at the end of processing, cmp will hold the smallest or largest index respectively.

necropower · February 18, 2013, 12:49am

it seems odd, but this are the two options that i am using:

kernel code here...
__global__ void comparacao_paralela(unsigned char *img, unsigned char *clone, unsigned int *cmp, int N , int width , int widthStep , int nChannels)
{
	unsigned int indice , index;
.
.
.
#ifdef no_race
if (img[indice] != clone[indice])
	atomicExch(&cmp[indice], indice);
#endif
		
#ifdef race
if (img[indice] != clone[indice])
	atomicExch(&cmp[0], 1); //i dont care the indice number, i just need it to be != 0
#endif
.
.
.

end of kernel code

host code:
.
.
.
cudaMemset(CUDAcmp, 0, sizeof(unsigned int) * N);
comparacao_paralelaCPP(nBlocks, blockSize, CUDAClone_data, CUDAClone2_data, CUDAcmp, N, width, widthStep, nChannels);  <== the kernell call
cudaMemcpy(j, CUDAcmp, sizeof(unsigned int) * N, cudaMemcpyDeviceToHost);
#ifdef no_race
		for(k=0 ; k < N ; k++)
			if (j[k] != 0)
			{
				printf("-%d- ",j[k]);
				break;  //breaks this _for_
			}

		if (k == N)
		{
			printf("finalizing!!!!!! \n");
			break;
		}
		else
			printf("still run!!!!!! \n");
#endif

#ifdef race
		if (j[0] != 0)
			printf("-%d- ",j[0]);
		else
		{
			printf("finalizing \n",j[0]);
			break;
		}
#endif

if i set the RACE define, the code SOMETIMES run, SOMETIMES enter in a infinite loop.
if i set the NO_RACE define, the code ALWAYS run!

i REALLY dont know how the race is crushing me up here :(

Topic		Replies	Views
about atomicExch CUDA Programming and Performance	1	2402	November 9, 2008
Shared mem atomics Help needed to Fix this hang CUDA Programming and Performance	12	11797	June 8, 2009
Atomic Operations in CUDA CUDA Programming and Performance	5	29270	June 9, 2009
Is atomicExch() safe for incremental a global float array? CUDA Programming and Performance	13	18382	July 6, 2009
Implementing mutual exclusion lock using atomicCAS() CUDA Programming and Performance	2	2388	August 5, 2009
DeadLock on this piece of code. Require your help. CUDA Programming and Performance	13	4781	March 17, 2010
Atomics beginner question CUDA Programming and Performance	1	558	September 23, 2013
atomicexch atomicexch tesla cc 1.3 CUDA Programming and Performance	0	1491	February 4, 2009
Bit operation Technical CUDA Programming and Performance	1	1224	July 11, 2008
Problems with atomicExch() on Tesla M2050 CUDA Programming and Performance	4	830	November 30, 2010

noob question about ATOMIC operations...

Related topics