How are threads, and how can threads be, synchronized?
I understand that threads in a block can be synchronized with __syncthreads(), but if I have more than 1 block and wish to synchronize ALL threads,
- is the only way to do that by using more kernels when threads are guaranteed to be in sync at the end of the kernel?
- do threads synchronize at the end of device functions?