We have been told that using __syncthread() in divergent code is dangerous because not every thread reaches the corresponding barrier sync point, potentially leading to deadlocks.
The PTX ISA manual V1.2 states that the bar.sync instruction has an argument, specifying some sort of identifier of the barrier synchronization.
Can I use this to achieve reliable barrier synchronization in if/else statements?
Assuming I have control over the ID passed to the bar.sync command in the PTX, would the following pseudo-code work?
Sync to Barrier # 0
if ( some condition )
Sync to Barrier #1 Do some more
Do something else
Sync to Barrier #1 Do more of this
Sync to Barrier # 2
My point is that I would use the same ID in both divergent code branches, allowing all threads to eventually sync on the same barrier ID - no matter which branch they are in. If this works in principle, couldn’t the compiler try to emit matching IDs on its own?