Hi, all!
Just suppose we have a kernel which doesn’t have __syncthreads() and works correctly. If I deliberately insert __syncthreads() somewhere in the kernel, then we should generally expect the kernel produces the same result, right?
Thank you!
Hi, all!
Just suppose we have a kernel which doesn’t have __syncthreads() and works correctly. If I deliberately insert __syncthreads() somewhere in the kernel, then we should generally expect the kernel produces the same result, right?
Thank you!
If you have a kernel that is correct (note that a kernel that “works correctly” could still be incorrect if there is a race condition that isn’t manifesting), adding __syncthreads() will not affect the correctness as long as all threads in the block can reach the same __syncthreads() line. This means you cannot add __syncthreads() inside of an if statement unless all threads will take the same branch. I’m not sure if you can use syncthreads() if some threads can return early, but I would also avoid that.