It a simple question, but one which I don’t see answered anywhere in the programming guide.
The standard CPU implementation seems to be:
(b<a) ? a : b;
which is clearly divergent, but I’d like to know if CUDA does anything clever to get around it.
Also when doing something like
a = max(a,0);
will the compiler reduce that to
a *= (a>0)
to prevent divergence (assuming that max is divergent in the first place)?
EDIT: I particularly care about when a and b are floats, but a more general answer may be helpful for others. I hope there is someone who knows!