Floating points

JohnTortugo · October 28, 2010, 2:44pm

Hello.

What are the differences between floating point operations implementation in CPU and GPU? Is there a way to make them match?

john.

JohnTortugo · October 28, 2010, 2:44pm

Hello.

What are the differences between floating point operations implementation in CPU and GPU? Is there a way to make them match?

john.

seibert · October 28, 2010, 11:06pm

The short answer is “no”. It is non-trivial to get two different floating point implementations to match (CPU to GPU, or CPU to CPU):

Most x86 32-bit compilers use the x87 stack registers for floating point, which means you get 80 bit floats for intermediate values, regardless of the data type in your C code. (This can be overcome by compiler options to force intermediate floating point values to be flushed to memory, at the expense of speed.)
If you get the compiler to generate SSE instructions for floating point (the default on most 64-bit compilers), then you will probably get the appropriate single or double precision behavior, as long as you don’t accidentally promote a single precision value to double precision in an expression. Examples of this are multiplying by a floating point constant not suffixed with “f” or using sin() or cos(), which are double precision functions.
Even if you do all of that, you still will see differences because floating point addition and multiplication are not associative. The order in which you operate on values changes the result. An obvious case of this is the difference between summing the elements in an array with a loop on the CPU (which goes in index order in the simplest case) and performing a parallel reduction on the GPU (which will do pairwise sums in a tree-like fashion). Trying to force operations to happen in the right order will either make your CPU code unreadable or your GPU code very slow. And, the compiler might defeat you anyway. :)

seibert · October 28, 2010, 11:06pm

The short answer is “no”. It is non-trivial to get two different floating point implementations to match (CPU to GPU, or CPU to CPU):

Most x86 32-bit compilers use the x87 stack registers for floating point, which means you get 80 bit floats for intermediate values, regardless of the data type in your C code. (This can be overcome by compiler options to force intermediate floating point values to be flushed to memory, at the expense of speed.)
If you get the compiler to generate SSE instructions for floating point (the default on most 64-bit compilers), then you will probably get the appropriate single or double precision behavior, as long as you don’t accidentally promote a single precision value to double precision in an expression. Examples of this are multiplying by a floating point constant not suffixed with “f” or using sin() or cos(), which are double precision functions.
Even if you do all of that, you still will see differences because floating point addition and multiplication are not associative. The order in which you operate on values changes the result. An obvious case of this is the difference between summing the elements in an array with a loop on the CPU (which goes in index order in the simplest case) and performing a parallel reduction on the GPU (which will do pairwise sums in a tree-like fashion). Trying to force operations to happen in the right order will either make your CPU code unreadable or your GPU code very slow. And, the compiler might defeat you anyway. :)

Topic		Replies	Views
floating point precision CUDA Programming and Performance	3	1462	April 10, 2009
Floats and floats... difference between CPU and GPU? CUDA Programming and Performance	12	14075	February 2, 2010
Is there a difference between GPU double precision and CPU double precision? CUDA Programming and Performance	14	10629	November 26, 2009
CPU and GPU floating point calculations Results are different CUDA Programming and Performance	6	21926	August 7, 2010
Floating-point precision problems CUDA Programming and Performance	14	4387	January 7, 2011
Simple division operation is different in CPU and GPU, why? CUDA Programming and Performance	6	6495	June 9, 2009
floating point processor of GPUs CUDA Programming and Performance	7	4381	August 28, 2015
Floating Point Precision of GPU CUDA Programming and Performance	6	2205	September 9, 2010
double precision CUDA Programming and Performance	1	256	May 29, 2019
Double precision Accuracy with sqrt, log math functions Results on CPU & GPU are not exactly sam CUDA Programming and Performance	9	5412	April 12, 2012