Hi,
Apologies if this question has an obvious answer, but how does the CUDA compiler/GPU handle the following loop:
do
{
if (condA)
{
instr1;
instr2;
}
else
{
instr3;
instr4;
instr5;
}
} while (condW);
where condA is a boolean variable, each of instrN is simple (like x = constant, or x += arrayElement[y]; or arrayElement[y] = constant;), and condW may be different for each thread in a warp, and is data dependent (not dependent on the number of times through the loop). The array accesses are to an array in shared memory, used to communicate between the threads.
Is predication likely to be used? If not, is there any way to force it to be used? I would like enforce that threads that have condA = false to idle while instr1 and instr2 execute, and likewise that threads with condA true idle while instr3, 4 and 5 are executing.
I assume a __syncthreads() call cannot be used inside the loop as not all threads execute the loop the same number of times.
Finally, can I write “inline assembler” statements to explicitly use predication?
Cheers,
James