__syncthreads() is ignored by threads

Hi all,
I’ve run into a strange situation - while part of the threads is working, others end up racing ahead past multiple __syncthreads() statements. The code includes fragments where only a handful of threads are busy (based on a threadIdx.x condition), and others should wait.
It took me several days of painful debugging to reach this conclusion, but that’s exactly the situation. I’m still working on simplifying the code to get a small reproducible piece to post, but at this point the question is: Has anyone run into this issue before? Are there situations when __syncthreads() can be ignored? Or am I dealing with a bug?
The problem is seen on a Tesla 2050 and GTX 480.

Thanks in advance for any advice.

Best,

Sasha

I’ve never seen this, and I don’t recall anyone ever reporting such a situation before. It is still possible there is another explanation, but hopefully your test case will make it clear what is going on.

One extra question: Are all threads in the block guaranteed to reach the same __syncthreads() call in each case?

The behavior of __syncthreads() is undefined if only some threads reach a particular __syncthreads(), with other threads skipping the barrier due to termination or branching. I believe it is also not supported to have some threads reach on __syncthreads() line and other threads reach a different __syncthreads() line, but that is an ambiguous case not mentioned in the high level documentation. The PTX manual probable explains more clearly what the expected behavior is.

Seibert, I’m checking the possibility of branching right now. It shouldn’t really be happening, since I’ve been trying to keep the __syncthreads() statements out of any conditional expressions. However, the application is fairly complex, and I need to make sure. Once I get a definitive answer (and/or reproducible test case), I’ll post it.
Thanks

Sasha

__syncthreads() works as advertised. I found the cause of the issue: a __syncthreads() statement buried in a function called only by a subset of threads. Instead of hanging, the threads just continued to work, but observing different __syncthreads() statements. I guess it falls under Nvidia’s definition of “unpredictable side effects”.

Thanks

Sasha