Floats and floats... difference between CPU and GPU?

anthonyfmorse · February 1, 2010, 12:43pm

I’m running code on the GPU and on the CPU which does exactly the same thing, and then using assert to compare the results, just to make sure that my CUDA code is doing what I think it should be doing. Anyway I’m getting occasional small differences in the resulting floating point numbers between the CPU and GPU. It’s not on every result but those that are different are never different before the 8th decimal place.

This seems very strange to me, am I missing something or is there some reason for this.

Just in case it’s relevant, my code calculates a matrix vector multiplication (both floats), stores the result in shared mem (also float), then applies a sigmoid 1/(1+exp(-n)) and then copies the result back.

Sarnath · February 1, 2010, 1:59pm

CPU (x87 based floating point ops) use 80-bits internally for precision. SSE code uses 64-bits as required by IEEE754.

So, Try compiling your CPU code disabling x87 based flops

MisterAnderson42 · February 1, 2010, 2:06pm

Or just change 1/(1+exp(-n)) to 1/(exp(-n)+1) and see your results change, even on the just CPU. Floating point results depend on the order of operations. See [url=“http://docs.sun.com/source/806-3568/ncg_goldberg.html”]http://docs.sun.com/source/806-3568/ncg_goldberg.html[/url] for lots of good info.

8 significant digits of agreement between your two different computations is an excellent agreement. Single precision floating point is usually good to 6.

anthonyfmorse · February 1, 2010, 3:55pm

Hmm it is as I expected, however this prevents me using assert to check my results… unless I round up…

anthonyfmorse · February 1, 2010, 4:01pm

seibert · February 1, 2010, 6:51pm

You can assert any expression, so there is no reason you can’t do:

assert( fabs(gpu - cpu) < 1e-6); // absolute error

or

assert( fabs( (gpu - cpu)/cpu ) < 1e-6); // relative error

You could even roll these up into a convenient preprocessor macro.

anthonyfmorse · February 1, 2010, 7:45pm

I used the following in the end…

assert( int( result_from_host * 10000) == int( result_from_device * 10000) )

_Big_Mac · February 1, 2010, 9:26pm

I don’t think you understand how floats work.

You can’t ever compare floats that are results of computations directly (ie. by ==), not even when you stick to CPU. Here, read this for example [url=“Comparing Floating Point Numbers”]http://www.cygnus-software.com/papers/comp...aringfloats.htm[/url]

anthonyfmorse · February 2, 2010, 10:01am

Thanks BigMac, I had suspected the architecture difference was the cause of this issue as has been confirmed. Using (fabs(a-B) < error) is fine, but maybe you didn’t follow what I was doing… by multiplying the float by 10000, I’m shifting 4 decimal places and then converting to an int, i.e. dropping the rest so my version simply evaluates the first 4 decimal places and ignores the rest. Both methods would seem to do the job and as this is a temporary check to make sure the program is working properly.

I appreciate that floats introduce differences between architectures etc, but to say you can’t ever compare floats directly on the same CPU is going a bit far no?

_Big_Mac · February 2, 2010, 12:38pm

If you have 1.0004 and 1.0005 as a result of two computations, which is a very good convergence as far as single precision floats go, when you multiply them by 10k and cast to int you get 10004 and 10005. More or less, because there could be errors propagated by the multiplication by 10000.0f.

(fabs(A- B ) < error) isn’t that cool actually, if A and B are big, so should the error be. Remember, floats have a fixed number of significant digits they can reliably represent and the decimal point, you know, floats. So, assuming a float can store 5 digits and the error shows up on the last one, it can either be 1.0005 (error in the range of 0.0001) or 10005 (error in the range of unity!). So you would have to scale the error depending on the order of magnitude you expect your output to be.

It might be enough for a temporary quick check but don’t ever use this in production code. In fact, completely avoid checking for float equivalence in production code, there’s no robust way of doing it.

Even on the same CPU you can’t compare floats directly. Floats are really tricky beasts.

exp(log(1.0f)) doesn’t necessarily evaluate back to 1.0f and exp(log(0.1f)) most certainly won’t because you can’t even represent 0.1 with a float (0.09996 or something like that). Even 0.1+0.1 won’t equal 0.2 - it will be close but not equal. You get different error/precision with various mathematical operations or order of operations, 2.1*(1.0+5.0) != 2.11.0 + 2.15.0 != 12.6.

(A/B)*B != A because division is tricky and generates a bigger error than multiplication. And there’s a big difference between (A/B)C and (AC)/B even though mathematically they are equivalent. So even on a single CPU if you have two functions that implement the same algorithm slightly differently, you can’t check for equivalence with ==. Each floating point operation introduces a small error and perhaps some rounding, some operations propagate those errors very fast (like division), and the final result depends on the order in which those errors accumulate.

It takes knowledge of how floats are represented, how math operations are carried out and even how certain functions (like sin or exp) are implemented to really appreciate how nasty floats are. I’m not even getting near NaNs and infinities or the hilarity that ensues when you divide a huge number by a tiny one.

By the way, with floats in one special case, A == A returns false… External Media

anthonyfmorse · February 2, 2010, 3:01pm

If you have 1.0004 and 1.0005 as a result of two computations, which is a very good convergence as far as single precision floats go, when you multiply them by 10k and cast to int you get 10004 and 10005. More or less, because there could be errors propagated by the multiplication by 10000.0f.

(fabs(A- B ) < error) isn’t that cool actually, if A and B are big, so should the error be. Remember, floats have a fixed number of significant digits they can reliably represent and the decimal point, you know, floats. So, assuming a float can store 5 digits and the error shows up on the last one, it can either be 1.0005 (error in the range of 0.0001) or 10005 (error in the range of unity!). So you would have to scale the error depending on the order of magnitude you expect your output to be.

It might be enough for a temporary quick check but don’t ever use this in production code. In fact, completely avoid checking for float equivalence in production code, there’s no robust way of doing it.

Even on the same CPU you can’t compare floats directly. Floats are really tricky beasts.

exp(log(1.0f)) doesn’t necessarily evaluate back to 1.0f and exp(log(0.1f)) most certainly won’t because you can’t even represent 0.1 with a float (0.09996 or something like that). Even 0.1+0.1 won’t equal 0.2 - it will be close but not equal. You get different error/precision with various mathematical operations or order of operations, 2.1*(1.0+5.0) != 2.11.0 + 2.15.0 != 12.6.

(A/B)*B != A because division is tricky and generates a bigger error than multiplication. And there’s a big difference between (A/B)C and (AC)/B even though mathematically they are equivalent. So even on a single CPU if you have two functions that implement the same algorithm slightly differently, you can’t check for equivalence with ==. Each floating point operation introduces a small error and perhaps some rounding, some operations propagate those errors very fast (like division), and the final result depends on the order in which those errors accumulate.

It takes knowledge of how floats are represented, how math operations are carried out and even how certain functions (like sin or exp) are implemented to really appreciate how nasty floats are. I’m not even getting near NaNs and infinities or the hilarity that ensues when you divide a huge number by a tiny one.

By the way, with floats in one special case, A == A returns false… External Media

Hmm I’m beginning to realise just how strange floats can be, I knew the double-precision thing for physics simulations but not the full extent of the issue. Thanks BigMac, I’ll avoid float comparisons where possible.

For checking my code I usually print out a few results to check they look correct, but as I’ve been learning CUDA I’ve had some strange things happen, for example where one of my blocks didn’t work while the others did, and just checking a few outputs does not always show this up, hence the CPU-GPU comparison with assert.

YDD · February 2, 2010, 3:02pm

Unless, of course, your compiler ‘helpfully’ optimises the test away :D

Of course, what the compiler should do in such cases is emit a warning “There’s a function for that!” and pipe it through ‘banner’ ;)

seibert · February 2, 2010, 4:00pm

There’s nothing wrong with checking absolute error, but you are right that you have to be smart about it. The error bound needs to be intelligently selected for your specific problem, and not just use 1e-6 for everything. :) Relative error probably applies to more situations, but again, you have to understand the problem to know what you want.

Topic		Replies	Views
Float accuracy CUDA Programming and Performance	16	9370	July 22, 2010
Precision of floats does CUDA use half precision instead of single precision for floats? CUDA Programming and Performance	5	2284	March 15, 2010
Floating-point precision problems CUDA Programming and Performance	14	4414	January 7, 2011
CPU and GPU floating point calculations Results are different CUDA Programming and Performance	6	21969	August 7, 2010
floating point error Error with floating point division CUDA Programming and Performance	9	8396	November 30, 2007
discrepancy between CPU and GPU after a division (accuracy issue) CUDA Programming and Performance	3	1510	June 10, 2015
Is there a difference between GPU double precision and CPU double precision? CUDA Programming and Performance	14	10741	November 26, 2009
Floating Point Accuracy CUDA Programming and Performance	11	30431	April 6, 2013
Floating points CUDA Programming and Performance	3	2060	October 28, 2010
CPU and GPU Floating point anomaly CUDA Programming and Performance	10	5664	November 10, 2013

Floats and floats... difference between CPU and GPU?

Related topics