How are threads, and how can threads be, synchronized?

I understand that threads in a block can be synchronized with __syncthreads(), but if I have more than 1 block and wish to synchronize ALL threads,

  1. is the only way to do that by using more kernels when threads are guaranteed to be in sync at the end of the kernel?
  2. do threads synchronize at the end of device functions?


No. device functions are inlined into the kernel.