__syncthreads() in divergent branches by giving an argument to bar.sync in PTX

We have been told that using __syncthread() in divergent code is dangerous because not every thread reaches the corresponding barrier sync point, potentially leading to deadlocks.

The PTX ISA manual V1.2 states that the bar.sync instruction has an argument, specifying some sort of identifier of the barrier synchronization.

Can I use this to achieve reliable barrier synchronization in if/else statements?
Assuming I have control over the ID passed to the bar.sync command in the PTX, would the following pseudo-code work?

Sync to Barrier # 0

if ( some condition )
Do something

Sync to Barrier #1

Do some more

Do something else

Sync to Barrier #1

Do more of this


Sync to Barrier # 2

My point is that I would use the same ID in both divergent code branches, allowing all threads to eventually sync on the same barrier ID - no matter which branch they are in. If this works in principle, couldn’t the compiler try to emit matching IDs on its own?


Interesting, though unless you’re incredibly concerned about efficiency, you could just cache the result for the if statement in some shared variable, and repeat the if/else branch after the intermediate __syncthreads(). I think this is easier to read as well.

We had big discussions and some experiements as well on this point… FInally, we decided that the “argument” cannot be used for a conditional __syncthreads…

Search the forum if you would be interested