 # __powf(): wrong behavior

__powf() is not returning the correct values here. Suppose the following code:

``````float x = 3.5f;

float xy = 1.f;

int y = 12;

for(int i=0; i<y; ++i)

xy *= x;
``````

This code computes xy = x[sup]y[/sup]. xy = __powf(x, y) returns a different result. I can’t tell what are the exact values but what seem to happen is that __powf() is only working with values of x in the [0,1] interval, but Im not sure. I tell this because I’m raytracing implicit surfaces, and when I render a sphere, I see only its lower right part, and only part of it…but when I use fors to compute the powers (the exponents are always integers) I get the correct rendering.

I don’t know how much of the error, but, should be noted that, in cuda, the “powf()” has a low precision. In my program, it is much different with Frotran language.

I confirm. You see, if the x is in the range [0;1] then, whatever power is, the result is within range [0;1] too. This allows to use approximation formulas.

If however, x is more then 1 it’s a completely different thing.

Here’s what I suggest to try. According to math, for any reasonable x and y:

x^y = exp(y*ln(x)) [see wikipedia article]

so this may work better. If it works identically as x^y, this means though that this is how
NVIDIA actually implemented this. And in this case the precision drop is either in “exp” or “ln” functions.

Does x=-1 and y=3 count as reasonable? ;)

CUDA provides the powf(x, y) function (no underscores), which is accurate to 8 units in the last place for any valid set of inputs, including negative x and natural y. So it is not correctly-rounded either, but fairly accurate and competitive with CPU implementations.

In your example 3.5^12, the exact result 13841287201 / 4096 is not representable in single precision, neither are the intermediate results using your algorithm, so you should not expect the results to match exactly.

The __powf function is implemented as exp2f(y * __log2f(x)). Error bounds are given in the CUDA manual, section C.2.1. Note that the relative error of __log2f(x) is unbounded for x between 0.5 and 2.

Anyway, if y is an integer, you don’t need such a complex approximation. Just using the binary exponentiation algorithm will be enough, especially if y is constant, in which case it can be completely unrolled to a few multiplications.

Thanks for replies. I think CUDA documentation about these math functions is very limited (I dont understand why its in the Programming guide)…but now I understand why that happens. And those exponentiation algorithms are very interesting. I’ll give them a try.

Thanks

Maybe because you need that info when you are programming???

Well, not only while programming :rolleyes: I think there should be a documentation which tells the specifics of each function in more detail, in a list or something, and not spread around a programming guide.

I agree with xissburg.
Something that always strikes me is that the performance of the math functions is described in details in almost a whole chapter of the programming guide, but if you want to know their behavior and accuracy, you have to look it up in Appendix C, which does not even appear in the TOC of the PDF…

Does that means GPU folks worry more about having their app being fast than having it return meaningful results? ;)

I actually found the fact that it is all together in the Appendix easier to look it up. Things not in the appendix are the ones the have me searching all over the place once in a while ;)