as stated in the programming manual, performing a branch in your kernel causes a penalty.
It was stated in the programmers manual that both routes might be taken, but only the results
of the proper path are kept. So the execution time would be that of both paths summed up.
But I wonder what happens if all threads of a warp (or block) need to take the same route
when branching.
Assume something like:
__global__ void myKernel()
{
... // do some stuff
... // evaluate n
if (n > 0)
{
// compute function A
...
}
else
{
// compute function B
...
}
... // do some stuff
}
And for one warp each thread evaluates n to 1, so all threads take the same path. Does it still
mean that both paths, function A and function B, are executed?
In the programming manual that situation is also addressed. Both execution paths are executed only if predicated execution is chosen by the compiler (and then, there is no branch at all, the two paths are executed with complementary predicates). When branching, there is no penalty if every thread in the warp evaluate the same condition (there is no divergence in the warp). Take a look at Section 5.1.1.2 in the programming manual. It is clearly explained :)