Hello,
is there a way to have all threads to wait at a barrier instruction?
What I need to do is to perform some calculations, wait for threads to finish,
and be sure they finished, and then start another set of calculations in the same kernel. Something like this:
for (int i = 1; i <=arbitrary_number; i++) {
//do stuff
//wait for threads to finish
}
basically, so that all threads will synchronize after every iteration of the loop. Is that possible?
Do I think correctly that __syncthreads only works for threads within a single block?
Crap, wrong forum, should be in " CUDA Programming and Development", sorry. If someone with power sees, could you move the topic?