Evaluation of complex conditions Do threads diverge ?

Here is the code:

char* p = array[threadIdx.x];

if (threadIdx.x < CONST || *p !=0)



where array is in global device memory. My question is: is the order of conditions important?

The dilemma is whether threads start to diverge because some of them fall on the first condition while others only on the second. Basically such threads take the same branch, but if the second condition is not evaluated when the first one fails, there is a instruction shift among those threads.

I tested this using the profiler, but the count of divergent branches remained roughly the same (I was running it several times on random data, so I could only observe the average counts). This would imply that either all the conditions are always evaluated in all threads or the end of complex conditions is some sort of synchronization point. In which case the order of conditions is not important.

Does anyone have a better knowledge of this?

When using branch predication none of the instructions whose execution depends
on the controlling condition gets skipped. Instead, each of them is associated with a
per-thread condition code or predicate that is set to true or false based on the
controlling condition and although each of these instructions gets scheduled for
execution, only the instructions with a true predicate are actually executed.
Instructions with a false predicate do not write results, and also do not evaluate
addresses or read operands.
The compiler replaces a branch instruction with predicated instructions only if the
number of instructions controlled by the branch condition is less or equal to a
certain threshold: If the compiler determines that the condition is likely to produce
many divergent warps, this threshold is 7, otherwise it is 4.

Cuda guide 1.1 page 48