I’m optimizing some code, and currently one of my bottlenecks is trying to find the maximum absolute element of a vector. In this case, the vector length is 24, and all variables are within registers. Currently, I just do the following (where my vector is stored as 6 float4 variables:

[codebox]

float c0 = fmaxf(fabsf((a0).x), fabsf((a0).y));

float c1 = fmaxf(fabsf((a0).z), fabsf((a0).w));

float c2 = fmaxf(fabsf((a1).x), fabsf((a1).y));

float c3 = fmaxf(fabsf((a1).z), fabsf((a1).w));

float c4 = fmaxf(fabsf((a2).x), fabsf((a2).y));

float c5 = fmaxf(fabsf((a2).z), fabsf((a2).w));

float c6 = fmaxf(fabsf((a3).x), fabsf((a3).y));

float c7 = fmaxf(fabsf((a3).z), fabsf((a3).w));

float c8 = fmaxf(fabsf((a4).x), fabsf((a4).y));

float c9 = fmaxf(fabsf((a4).z), fabsf((a4).w));

float c10 = fmaxf(fabsf((a5).x), fabsf((a5).y));

float c11 = fmaxf(fabsf((a5).z), fabsf((a5).w));

c0 = fmaxf(c0, c1); c1 = fmaxf(c2, c3); c2 = fmaxf(c4, c5);

c3 = fmaxf(c6, c7); c4 = fmaxf(c8, c9); c5 = fmaxf(c10, c11);

c0 = fmaxf(c0, c1); c1 = fmaxf(c2, c3); c2 = fmaxf(c4, c5);

c0 = fmaxf(c0, c1); c0 = fmaxf(c0, c2);

[/codebox]

I just wondered if anyone had any smarter way of doing this? I see in the programming guide that integer max and min functions are one cycle operations, but I don’t think this is true for fmaxf. I don’t believe there is a faster solution, but if anyone has one, I’d love to hear it.

Cheers.