threads within the same block communicate through shared memory and threads of different blocks cannot intuitively communicate*.
How do I efficiently retrieve results from compuations of several blocks?
Having just 1 block, the following works fine:
__shared__ int matches; //have each thread work through some data //... if (found) matches++; __syncthreads(); //make result available for host application d_matches = matches;
Obviously, this doesn’t work for more than 1 block.
While declaring the variable matches as a device variable would be one solution, there may be better ways of achieving the same result.
Thanks for any help on this,
*By writing the results of one block in kernel1 to global memory and have another block of kernel2 read them from there, even blocks can communicate (not very handy but need in some cases).