Hello,

I’m currently implementing the classic parallel reduction kernel to find the maximum of an array along with its index.

Then, I use fmaxf() to compute the maximum at each step, but, as I want the associated index, I need do perform another test.

I wondered what is the best way to do it without introducing branch divergences.

If I write :

```
vmax = fmaxf(v1, v2);
idxmax = v1 > v2 ? idx1 : idx2;
```

is it equivalent to :

```
vmax = fmaxf(v1, v2);
if(v1 > v2)
idx = idx1;
else
idx = idx2;
```

?

I’ve read some things about “branch prediction”, but nothing precise. If the previous code is “branch predicted”, why not write (without using fmaxf anymore) :

```
if(v1 > v2) {
idx = idx1;
vmax = v1;
}
else {
idx = idx2;
vmax = v2;
}
```

if it is “branch predicted”, it seems that it will be faster than the version with fmaxf.

What is the “limit” of branch prediction ?

What would you advise me to do ?

Thank you !