Floats and floats... difference between CPU and GPU?

I’m running code on the GPU and on the CPU which does exactly the same thing, and then using assert to compare the results, just to make sure that my CUDA code is doing what I think it should be doing. Anyway I’m getting occasional small differences in the resulting floating point numbers between the CPU and GPU. It’s not on every result but those that are different are never different before the 8th decimal place.

This seems very strange to me, am I missing something or is there some reason for this.

Just in case it’s relevant, my code calculates a matrix vector multiplication (both floats), stores the result in shared mem (also float), then applies a sigmoid 1/(1+exp(-n)) and then copies the result back.

CPU (x87 based floating point ops) use 80-bits internally for precision. SSE code uses 64-bits as required by IEEE754.

So, Try compiling your CPU code disabling x87 based flops

Or just change 1/(1+exp(-n)) to 1/(exp(-n)+1) and see your results change, even on the just CPU. Floating point results depend on the order of operations. See [url=“http://docs.sun.com/source/806-3568/ncg_goldberg.html”]http://docs.sun.com/source/806-3568/ncg_goldberg.html[/url] for lots of good info.

8 significant digits of agreement between your two different computations is an excellent agreement. Single precision floating point is usually good to 6.

Hmm it is as I expected, however this prevents me using assert to check my results… unless I round up…

You can assert any expression, so there is no reason you can’t do:

assert( fabs(gpu - cpu) < 1e-6); // absolute error

or

assert( fabs( (gpu - cpu)/cpu ) < 1e-6); // relative error

You could even roll these up into a convenient preprocessor macro.

I used the following in the end…

assert( int( result_from_host * 10000) == int( result_from_device * 10000) )

I don’t think you understand how floats work.

You can’t ever compare floats that are results of computations directly (ie. by ==), not even when you stick to CPU. Here, read this for example [url=“Comparing Floating Point Numbers”]http://www.cygnus-software.com/papers/comp...aringfloats.htm[/url]

Thanks BigMac, I had suspected the architecture difference was the cause of this issue as has been confirmed. Using (fabs(a-B) < error) is fine, but maybe you didn’t follow what I was doing… by multiplying the float by 10000, I’m shifting 4 decimal places and then converting to an int, i.e. dropping the rest so my version simply evaluates the first 4 decimal places and ignores the rest. Both methods would seem to do the job and as this is a temporary check to make sure the program is working properly.

I appreciate that floats introduce differences between architectures etc, but to say you can’t ever compare floats directly on the same CPU is going a bit far no?

If you have 1.0004 and 1.0005 as a result of two computations, which is a very good convergence as far as single precision floats go, when you multiply them by 10k and cast to int you get 10004 and 10005. More or less, because there could be errors propagated by the multiplication by 10000.0f.

(fabs(A- B ) < error) isn’t that cool actually, if A and B are big, so should the error be. Remember, floats have a fixed number of significant digits they can reliably represent and the decimal point, you know, floats. So, assuming a float can store 5 digits and the error shows up on the last one, it can either be 1.0005 (error in the range of 0.0001) or 10005 (error in the range of unity!). So you would have to scale the error depending on the order of magnitude you expect your output to be.

It might be enough for a temporary quick check but don’t ever use this in production code. In fact, completely avoid checking for float equivalence in production code, there’s no robust way of doing it.

Even on the same CPU you can’t compare floats directly. Floats are really tricky beasts.

exp(log(1.0f)) doesn’t necessarily evaluate back to 1.0f and exp(log(0.1f)) most certainly won’t because you can’t even represent 0.1 with a float (0.09996 or something like that). Even 0.1+0.1 won’t equal 0.2 - it will be close but not equal. You get different error/precision with various mathematical operations or order of operations, 2.1*(1.0+5.0) != 2.11.0 + 2.15.0 != 12.6.

(A/B)*B != A because division is tricky and generates a bigger error than multiplication. And there’s a big difference between (A/B)C and (AC)/B even though mathematically they are equivalent. So even on a single CPU if you have two functions that implement the same algorithm slightly differently, you can’t check for equivalence with ==. Each floating point operation introduces a small error and perhaps some rounding, some operations propagate those errors very fast (like division), and the final result depends on the order in which those errors accumulate.

It takes knowledge of how floats are represented, how math operations are carried out and even how certain functions (like sin or exp) are implemented to really appreciate how nasty floats are. I’m not even getting near NaNs and infinities or the hilarity that ensues when you divide a huge number by a tiny one.

By the way, with floats in one special case, A == A returns false… External Media

Hmm I’m beginning to realise just how strange floats can be, I knew the double-precision thing for physics simulations but not the full extent of the issue. Thanks BigMac, I’ll avoid float comparisons where possible.

For checking my code I usually print out a few results to check they look correct, but as I’ve been learning CUDA I’ve had some strange things happen, for example where one of my blocks didn’t work while the others did, and just checking a few outputs does not always show this up, hence the CPU-GPU comparison with assert.

Unless, of course, your compiler ‘helpfully’ optimises the test away :D

Of course, what the compiler should do in such cases is emit a warning “There’s a function for that!” and pipe it through ‘banner’ ;)

There’s nothing wrong with checking absolute error, but you are right that you have to be smart about it. The error bound needs to be intelligently selected for your specific problem, and not just use 1e-6 for everything. :) Relative error probably applies to more situations, but again, you have to understand the problem to know what you want.