Predication & do ... while loops What happens with this sample loop ...


Apologies if this question has an obvious answer, but how does the CUDA compiler/GPU handle the following loop:



  if (condA)











} while (condW);

where condA is a boolean variable, each of instrN is simple (like x = constant, or x += arrayElement[y]; or arrayElement[y] = constant;), and condW may be different for each thread in a warp, and is data dependent (not dependent on the number of times through the loop). The array accesses are to an array in shared memory, used to communicate between the threads.

Is predication likely to be used? If not, is there any way to force it to be used? I would like enforce that threads that have condA = false to idle while instr1 and instr2 execute, and likewise that threads with condA true idle while instr3, 4 and 5 are executing.

I assume a __syncthreads() call cannot be used inside the loop as not all threads execute the loop the same number of times.

Finally, can I write “inline assembler” statements to explicitly use predication?



well it is just C, so if it is valid C it works as expected.
I believe it is not possible to use inline ptx, but you can generate ptx and then optimize the ptx by hand if you want.

I’ve never seen the compiler generate predicated instructions. Generate the ptx and see for yourself.

Write ptx by hand.

I’m not quite sure I understand what you want here: Something different than the semantics of the if in your code?

Correct. To use __syncthreads, you must have condW be constant across the whole block.