Good solution?

Kwyjibo2010 · June 11, 2010, 8:47am

Hi,

my device does not support atomic functions in shared memory. That is why I had to create a workaround:

#define WARPSIZE			32

#define LOG2_WARPSIZE		5

#define BITMASK_THREADID	(0xFFFFFFFF << (WARPSIZE - LOG2_WARPSIZE))

#define BITMASK_VALUE		(0xFFFFFFFF >> LOG2_WARPSIZE)

__device__ void sharedAtomicIncrement(unsigned int* s_address)

{

	unsigned int count;

	do

	{

		//Read shared memory and increment count

		count = *s_address & BITMASK_VALUE;

		count++;

		//Write value

		*s_address = count | (threadIdx.x << (WARPSIZE - LOG2_WARPSIZE));

	} while ((*s_address & BITMASK_THREADID) != (threadIdx.x << (WARPSIZE - LOG2_WARPSIZE)));

}

The basic idea is that every thread reads the value at s_address. Afterwards the thread increases the value and writes a unique id for every thread in the warp into the 5 most significant bits. The thread attempts to write to the address until it can read its id from memory.

I know, that active waiting is not an elegant solution, but I could not think of any other. At least I can gurantee a worst case of 32 write attempts.

Do you have any better ideas?

Regards,

Kwyjibo

jjp · June 11, 2010, 2:00pm

You could replicate the array n times, let n threads do the counting and sum up the replicated entries in the end. Doing the counting with just one single thread might also be feasible if there is a sufficient amount of parallel work besides counting.

Topic		Replies	Views
atomicInc for shared memory with CC1.1 CUDA Programming and Performance	4	6970	March 5, 2011
Atomic operation problem CUDA Programming and Performance	2	920	June 2, 2008
Questions with shared memory CUDA Programming and Performance	3	1704	June 21, 2011
Shared memory issue CUDA Programming and Performance	4	800	July 18, 2013
A good idea or not ? need advice CUDA Programming and Performance	3	4391	January 11, 2010
Shared memory atomics and other questions. CUDA Programming and Performance	19	13904	November 13, 2010
Shared memory write conflicts Looking for a little help... CUDA Programming and Performance	5	4973	September 7, 2007
Thread memory concurrency within the same block? CUDA Programming and Performance	12	1579	September 29, 2010
atomic operations to shared memory CUDA Programming and Performance	0	2154	October 14, 2008
Atomic operation in shared memory CUDA Programming and Performance	1	3858	August 12, 2008

Good solution?

Related topics