Floating points

Hello.

What are the differences between floating point operations implementation in CPU and GPU? Is there a way to make them match?

john.

Hello.

What are the differences between floating point operations implementation in CPU and GPU? Is there a way to make them match?

john.

The short answer is “no”. It is non-trivial to get two different floating point implementations to match (CPU to GPU, or CPU to CPU):

  • Most x86 32-bit compilers use the x87 stack registers for floating point, which means you get 80 bit floats for intermediate values, regardless of the data type in your C code. (This can be overcome by compiler options to force intermediate floating point values to be flushed to memory, at the expense of speed.)

  • If you get the compiler to generate SSE instructions for floating point (the default on most 64-bit compilers), then you will probably get the appropriate single or double precision behavior, as long as you don’t accidentally promote a single precision value to double precision in an expression. Examples of this are multiplying by a floating point constant not suffixed with “f” or using sin() or cos(), which are double precision functions.

  • Even if you do all of that, you still will see differences because floating point addition and multiplication are not associative. The order in which you operate on values changes the result. An obvious case of this is the difference between summing the elements in an array with a loop on the CPU (which goes in index order in the simplest case) and performing a parallel reduction on the GPU (which will do pairwise sums in a tree-like fashion). Trying to force operations to happen in the right order will either make your CPU code unreadable or your GPU code very slow. And, the compiler might defeat you anyway. :)

The short answer is “no”. It is non-trivial to get two different floating point implementations to match (CPU to GPU, or CPU to CPU):

  • Most x86 32-bit compilers use the x87 stack registers for floating point, which means you get 80 bit floats for intermediate values, regardless of the data type in your C code. (This can be overcome by compiler options to force intermediate floating point values to be flushed to memory, at the expense of speed.)

  • If you get the compiler to generate SSE instructions for floating point (the default on most 64-bit compilers), then you will probably get the appropriate single or double precision behavior, as long as you don’t accidentally promote a single precision value to double precision in an expression. Examples of this are multiplying by a floating point constant not suffixed with “f” or using sin() or cos(), which are double precision functions.

  • Even if you do all of that, you still will see differences because floating point addition and multiplication are not associative. The order in which you operate on values changes the result. An obvious case of this is the difference between summing the elements in an array with a loop on the CPU (which goes in index order in the simplest case) and performing a parallel reduction on the GPU (which will do pairwise sums in a tree-like fashion). Trying to force operations to happen in the right order will either make your CPU code unreadable or your GPU code very slow. And, the compiler might defeat you anyway. :)