handling thread divergence, Volta and Turing

I am not sure I interpret correctly the handling of thread divergence on Volta and later, and I would greatly appreciate some hand holding.

I’m looking at the Volta white paper https://images.nvidia.com/content/volta-architecture/pdf/volta-architecture-whitepaper.pdf, see Figs.22 and 23.

Assume that I have an if-else statement, with the if branch executing the instructions A1, A2, A3, etc., while the else branch sees instructions B1, B2, etc.

Assume that in a warp, 16 threads take the if, and the other 16 take the else branch.

Figure 23 seems to imply that A1 is executed, after which B1 is executed, after which A2 is executed, then B2, etc. That is, instructions get interleaved.

Is this really the case? For instance, if A2 hits a global mem read, will the 16 threads about to execute B2 wait there for A2 to get executed? Also, why A1 then B1, and then A2 then B2, etc. Why not B1 then A1, and then B2 and A2, etc.

I would expect to have the scheduler issue for execution the instruction for which all operands are available, no matter if it’s an “A” family instruction or “B” family instruction. This would also scale nicely if one, for instance, has four way divergence, where there are A, B, C, and D-type instructions.

I apologize if this was answered before, I don’t quite know how to search weather a specific question like this got answered (other than reading the manual, which does a good but not perfect job explaining things).

Thanks for your time.

There isn’t any low level description or specification of instruction issue order, when the SM schedulers have multiple options (which they would, in the case of Volta and beyond, in the presence of conditional code). If your code depends on particular scheduling order, and you have taken no steps to make that happen explicitly, your code is broken.

By extension of the above statements, then, there is no statement that instructions from separate conditional execution paths get interleaved.

Thanks, Robert - that clarifies it.

The documentation might benefit from an explanation like yours, perhaps added right after this blurb from the doc: “Statements from the if and else branches in the program can now be interleaved in time as shown in Figure 22”. Without your explanation, my mental image of what goes on was different and inaccurate.