Solutions for counting successors generated by threads?


i have a rather simple problem: Each thread expands a parent state and generates succesor states. The Number of the
successors varies between 0 and … (integer size).
The Host needs to know the number of all generated successors. So what i do is:

allocate space of size integer on gpu, call it succ_gpu and give the kernel the pointer to it.
shared int succs[THREADS];
succs[threadIdx.x] = 0;

if (add) succ[threadIdx.x]++;

if (threadIdx.x = 0) {
for (i = 0 to THREADS ) succ_gpu += succ[i];

copy succ_gpu from GPU to HOST and check it.

The Problem:
If i have more then one group this is not working always correct.

Perhaps someone here has a better way to do it? Or can tell me why it does not work with multiple groups?


By “groups”, do you mean blocks? In that case, you would have all blocks writing to succ_gpu with many race conditions.

Have a look at the reduction example in the SDK. It is exactly what you want (and more efficient than having only thread 0 do the work).

Yes this are blocks. I thought blocks are executed seqentially, but now i have learned that this is wrong. Thanks!

That was exectly what i was serching for. Many thanks,