What I was wondering is, how does the hardware handle ‘if (…) goto …;’ constructs when this if causes divergences in a warp. When will the threads be synchronized again? For if/else constructs and loops it’s rather obvious. If/else means first is one then the other branch executed then the threads continue together. For loops, threads exiting the loop wait until all other threads also exit, then they continue together. But for goto it is not clear to me.
As hyqneuron wrote, you can find out yourself using [font=“Courier New”]cuobjdump[/font]. Look fro the [font=“Courier New”]SSY[/font] instruction, it’s argument is the address of the synchronization point.
Okay, now I understand. The synchronization point is selected during the PTX->CUBIN step. I found in my example the synchronization point seems to be set at label b. Does someone know how the SP is determined by the compiler?
I do not understand why you would think that there’s any ambiguity with the code. It seems to me that threads could only be in sync again after they all reach c.
I believe the 3 different versions of your code wouldn’t generate SSY for most of the time. The original version should generate the code that I’ve already given. The 2 interpretations should generate PBK and BRK, with PBK setting the address at c.
Well, the threads that either do the small loop (first five lines from a: to goto a) and the ones doing the large loop (which jump to b: and then back to a:) could do 2 different things:
only the small loop is executed until all threads have jumped to b. then the section from b: to goto a is executed and the small loop starts again
only one iteration of the small loop is executed (in case of divergence to b:) until the goto a. Then for the threads which jumped to b: the section from b: to goto a is executed. Now all threads are again at a: and continue together in the small loop.
Or do I have a wrong picture of in which cases synchronization can happen? I wonder anyway what happens with multiple divergence, like if additionally some threads exit to c:. Can only some of the threads also synchronize (like the ones still looping) ignoring the exiting ones until later, or is synchronization an all-threads-or-will-not-happen?