Approximation Problem on multiplications - my fault or cuda problem?

I’m trying to develop a solver for a function (weierstrass problem), and substantially the function is F(x)=A(x)-B , where B<0, |B|>=|A(x)| and |A(0)|=|B| .
B is a static constant value, and


with j int,and every other variables float.

For certain values near x=0, approximation and that 5 multiplication straight in line, can magnify the approximation error and make |A|>|B| , so i need to find a way to avoid the approximation and to raise the precision (hw limited to float)… any ideas?! i’m literally gettin mad!


You are using the fast approximate math functions for cos and pow which could be introducing the error. I would recommend first using the accurate versions (get rid of the underscores). If that isn’t enough, then try double precision.

Is it possible to replace __powf(float,int) with a number of muliplications? Would probably improve precision(for reasonably small ints).

Try powf(float,int) as that will map to multiplication-based computation, and should give you much better accuracy. For very small powers, it would be best to simply use straight multiplication, e.g. xx, xx*x. The intrinsic __powf(float,float) maps to exp2f(b * __log2f(a)), meaning it is fast but not necessarily very accurate.

In general I would recommend using the standard math functions when developing new code and ensuring the code returns correct results, before replacing individual functions calls with fast but less accurate intrinsics, or deploying the -use_fast_math compiler flag.

I’ve found where the problem is:

I’ve set every float to double, calculate every “pow” as multiplications… and i’ve found that cosf produce wrong values… i mean: cos(Pi*aGreatOddInteger) = cos(Pi) = -1 … and i get -0.79.

I think that’s because cosf calculate the cos empirically… How can i solve this? any ideas?

You could try using x=fmod(x,2.0*PI) before cos(x). (Background: Programming guide states smaller maximum ups error for fmod() than cos())
Also verfiy that “-use_fast_math” is disabled.

Use [font=“Courier New”]cosf()[/font], not [font=“Courier New”]__cosf()[/font]. [font=“Courier New”]__cosf()[/font] is only accurate (to about 21 bits) in the interval [-pi, pi].

EDIT: Ideally, you’d use [font=“Courier New”]cospi(aGreatOddInteger)[/font] instead of [font=“Courier New”]cos(Pi*aGreatOddInteger)[/font] or [font=“Courier New”]__cosf(…)[/font].

The “Pi” in your code has limited precision, and is thus different from mathematical pi. cos() and cosf() use mathematical pi for the argument reduction to achieve full accuracy across the entire input range. Therefore your results will differ more and more from what you expect as the magnitude of aGreatOddInteger increases, as the numerical error (the deviation from a multiple of mathematical pi) in the input to cos() or cosf() increases.

The solution, as tera already pointed out, is to avoid explicit multiplication with a necessarily inaccurate machine approximation to pi, and instead use cospif() or cospi() which implicitly multiplies the input argument by mathematical pi. See also section 5.1.4 of the Best Practices Guide.

I fixed the “cos issue”, but now I have another one… i need to calculate powf(weierstrassB,j), the max value is 3^20… but it isn’t correct!! instead of 3486784401, i get 3486784520 with a multiplication function, and 3486784720 with powf.

3^15 is correct, 3^16 not… overflow?

I think i’ve exceeded the number of precision digits of the double format…

edit: i’ve already tried pow, no work at all

If both base and exponent are integers, you could use a function like this one:

__device__ long long ipow(int base, int exp)


    long long result = 1;

    long long bl=base;

    while (exp)


        if (exp & 1)

            result *= bl;

        exp >>= 1;

        bl *= bl;


return result;


powf() computes a single precision result. Single precision (the float data type) does not have enough bits to represent 3**20 accurately, that is, without rounding. The closest representable single precision number is 3486784512. powf(float,int) will get you very close, within one ulp in fact when I tried it with CUDA 4.0: powf(3.0f,20) = 3486784768 (0x4f4fd41d) versus a perfectly rounded result of 3486784512 (0x4f4fd41c).

If you need a completely accurate representation, you need to either call pow(double,int), or build your own integer based function, as mfatica suggests.