a question on __syncthreads

Hi, all!

Just suppose we have a kernel which doesn’t have __syncthreads() and works correctly. If I deliberately insert __syncthreads() somewhere in the kernel, then we should generally expect the kernel produces the same result, right?

Thank you!

If you have a kernel that is correct (note that a kernel that “works correctly” could still be incorrect if there is a race condition that isn’t manifesting), adding __syncthreads() will not affect the correctness as long as all threads in the block can reach the same __syncthreads() line. This means you cannot add __syncthreads() inside of an if statement unless all threads will take the same branch. I’m not sure if you can use syncthreads() if some threads can return early, but I would also avoid that.