Can __syncthreads() exist in multiple control path?
CUDA Programming Guide book says “__syncthread() is allowed in conditional code but only if the conditional evaluates identically across the entire thread block, otherwise the code execution is likely to hang or produce unintended side effects”.
For me, it seems that the last sentence gives some room for using __syncthread() in multiple control paths.
Suppose that execution path diverges like a tree, but it is merged back at some point later. If __syncthreads() are distributed among the diverged paths, but no matter which path a thread traverse, the thread can see identical number of __syncthreads() before reaching the merge point, then can this program run correctly?