Hello,
I’m currently implementing the classic parallel reduction kernel to find the maximum of an array along with its index.
Then, I use fmaxf() to compute the maximum at each step, but, as I want the associated index, I need do perform another test.
I wondered what is the best way to do it without introducing branch divergences.
If I write :
vmax = fmaxf(v1, v2);
idxmax = v1 > v2 ? idx1 : idx2;
is it equivalent to :
vmax = fmaxf(v1, v2);
if(v1 > v2)
idx = idx1;
else
idx = idx2;
?
I’ve read some things about “branch prediction”, but nothing precise. If the previous code is “branch predicted”, why not write (without using fmaxf anymore) :
if(v1 > v2) {
idx = idx1;
vmax = v1;
}
else {
idx = idx2;
vmax = v2;
}
if it is “branch predicted”, it seems that it will be faster than the version with fmaxf.
What is the “limit” of branch prediction ?
What would you advise me to do ?
Thank you !