pow function from kernel pow(2.0,3.0) is returning 7 and ceil 8

The device code

int x = pow(2.0f,3.0f); sets x to 7.
int x = ceil(pow(2.0f,3.0f)); sets x to 8.

What is the reason for this error?

Is this run on the GPU or CPU? Prob. there are some rounding errors.

This is not an error, but (somewhat) expected behaviour of floating point values. 8 might actualy be represented as 7.9999999999999…, in which case trunc() will make it 7, and ceil will make it 8. For this reason you should always be careful when converting floats to integers.

But pow(2.0,2.0) is 4.

Always a power to even number is OK and odd number is wrong. Anything specific about that? If i want to make integer power how can i acheive it in cuda? Do I need to write it at my own or is there any internal functions? Or can i use ceil(pow(x,y))?

Thanks a lot,

Amal P

what is wrong with good old 1<<x?

int x = pow( 3.0f, 2.0f ) // this gives 9
int x = pow( 2.0f, 3.0f )// this gives 7 instead of 8
int x = pow(2.0f,2.0f) // this gives 4
int x = pow(4.0,3.0) // this gives 64
int x = pow(2.0,5.0)// this gives 31 instead of 32

What is special with 2s odd powers?

try adding 0.5f before truncating :)

A ceil will also do. But why such a difference is comming? I am curious to know the reason. What is special with 2 to power of an odd number?

Looking at the PTX generated from the pow() function, there’s quite a lot of code generated to transform the arguments. Somewhere in that arithmetic there is some roundoff error. You should note that the Programming Guide says the powf() (same as pow() on current hardware) has a maximum error of 16 ulp, which is the worst of the standard library functions.

You can do much better for the special case of 2**x using the exp2f() function, which only has 2 ulp of error. And, as wumpus points out, if you are really just exponentiating 2 to some integer power and storing the result in an integer, you should use the << operator. It’s way faster, and always exact.

Your sample size is too small… this is almost certainly not only a problem of 2**odd, but rather a general problem which looks like it has a pattern in your small sampling.

Maybe someone (from NVIDIA) with intimate knowledge of how the pow is implemented can comment on whether taking the ceil of the result is a general solution to the problem?

If not, you can easily code your own pow(float/int, int) function. On CPUs this tends to be much faster anyway, but I’m not sure about how it would compare speed-wise on GPUs.

Actually, I also just found by experimentation that if you cast the second argument to an int, you will get the correct answer, so writing your own version that takes an int is not necessary.

It’s not a problem, it’s a property of floating-point math, be it single or double precision. :) You can almost never expect zero result difference relative to any preassigned value, and the difference might change even with different compiler settings, let alone different devices. So your code has to have reasonable FP error tolerance.

ceil() / floor() are bouncy with “almost” integer argument values: a couple of ULPs up / down – and the output result jumps / falls by 1.0… So as it was already recommended, in such cases you need to add 0.5f (or any other “big enough” [0…1] constant) before doing floor() / ceil().

CUDA math library code is open for everyone to inspect. pow is an overloaded function, with different versions for (float,float) and (float,int) arguments.

As Victor pointed out, pow(2.f, 3.f) result is just under 8.0f (within the stated ulps).

Paulius