Here is the code:
char* p = array[threadIdx.x];
if (threadIdx.x < CONST || *p !=0)
doSomething();
...
where array is in global device memory. My question is: is the order of conditions important?
The dilemma is whether threads start to diverge because some of them fall on the first condition while others only on the second. Basically such threads take the same branch, but if the second condition is not evaluated when the first one fails, there is a instruction shift among those threads.
I tested this using the profiler, but the count of divergent branches remained roughly the same (I was running it several times on random data, so I could only observe the average counts). This would imply that either all the conditions are always evaluated in all threads or the end of complex conditions is some sort of synchronization point. In which case the order of conditions is not important.
Does anyone have a better knowledge of this?