How do bar.sync and __syncthreads interact?

  1. What is the barrier resource that __syncthreads or other cuda runtime-level synchronization function use? Is it fixed to 0?

  2. if bar.sync uses the same barrier resource as __syncthreads does, then how would they interact?

__syncthreads() used to translate to a bar.sync instruction on barrier #0. Whether that is still the case with CUDA 10.0 and compute capability 7.x I am not sure, but it would be a simple experiment to compile a kernel to PTX and check for yourself.

Since there is no specification of this that I am aware of, I view it as an implementation detail, and therefore a hazard to depend on any particular behavior, from a code correctness point of view.

Any time you have to ask for unpublished information, or disassemble code to inspect compiler behavior, a flag should be raised in your thought process that indicates that what you are observing may not be dependable behavior for code correctness.

To tera,

Oh, I’m such a noob. Of course, I should have tried to look at the assembly.

Thanks.

Do you have any tips for reading assembly?

To Robert_Crovella,

thanks, I will keep that in mind. A safe method would be to only use bar.sync instead of __syncthreads, if I know I have to use bar.sync at least once.