Block-wide voting using shared memory Unexpected results


I was wondering whether anyone has been having problems using shared memory for performing a block-wide vote. Basically I’m trying to do something like this:

unsigned s =  ...;

	unsigned e =  ...;	


	__shared__ volatile unsigned flag;


		if(threadIdx.x == 0) {

			flag = 0;



		if(s < e){

			flag = 1;

                        /* Extra calculations here*/

			atomicAdd(&intcount, 1);




	} while(flag);

I would expect that if at least one thread writes to ‘flag’ the loop will continue for the whole block. However, it appears that sometimes the threads randomly leave the loop early. This in turn causes ‘intcount’ to have different values on separate executions of the kernel for the same input data. Is there a more reliable way to perform block-wide votes?


Off the top of my head :)

syncthreads is per block but flag can be seen by all blocks?

shared variables are per block, not per multiprocessor.

I vaguely remember having had or seen a similar problem, and it was solved by making the initial [font=“Courier New”]flag = 0[/font] unconditional, i.e., doing it in all threads and not just thread 0.

I can’t really remember the details, but it might have been that the compiler miscompiles it, inserting the joining point in the wrong place.




You need an extra __syncthreads() just inside the start of your do loop. This is very subtle! But you’re making an assumption that all threads are synced at the start of the loop, and they’re not… some may still be waiting at the end of the previous loop and they haven’t tested the flag yet, and thread 0 is going to change it on them!
So just put an initial syncthreads at the start to make sure all have finished their test before thread 0 touches the flag.
Yes, it means you need three!

SPWorley, you are right! I shouldn’t answer posts at 4am. Sat there, remembering vaguely that something was wrong with it, and just couldn’t see what.

Well observed!

Many thanks!