I’m optimizing some code, and currently one of my bottlenecks is trying to find the maximum absolute element of a vector. In this case, the vector length is 24, and all variables are within registers. Currently, I just do the following (where my vector is stored as 6 float4 variables:
[codebox]
float c0 = fmaxf(fabsf((a0).x), fabsf((a0).y));
float c1 = fmaxf(fabsf((a0).z), fabsf((a0).w));
float c2 = fmaxf(fabsf((a1).x), fabsf((a1).y));
float c3 = fmaxf(fabsf((a1).z), fabsf((a1).w));
float c4 = fmaxf(fabsf((a2).x), fabsf((a2).y));
float c5 = fmaxf(fabsf((a2).z), fabsf((a2).w));
float c6 = fmaxf(fabsf((a3).x), fabsf((a3).y));
float c7 = fmaxf(fabsf((a3).z), fabsf((a3).w));
float c8 = fmaxf(fabsf((a4).x), fabsf((a4).y));
float c9 = fmaxf(fabsf((a4).z), fabsf((a4).w));
float c10 = fmaxf(fabsf((a5).x), fabsf((a5).y));
float c11 = fmaxf(fabsf((a5).z), fabsf((a5).w));
c0 = fmaxf(c0, c1); c1 = fmaxf(c2, c3); c2 = fmaxf(c4, c5);
c3 = fmaxf(c6, c7); c4 = fmaxf(c8, c9); c5 = fmaxf(c10, c11);
c0 = fmaxf(c0, c1); c1 = fmaxf(c2, c3); c2 = fmaxf(c4, c5);
c0 = fmaxf(c0, c1); c0 = fmaxf(c0, c2);
[/codebox]
I just wondered if anyone had any smarter way of doing this? I see in the programming guide that integer max and min functions are one cycle operations, but I don’t think this is true for fmaxf. I don’t believe there is a faster solution, but if anyone has one, I’d love to hear it.
Cheers.