Emulating FMAD on the host?

FullyArticulate · May 25, 2008, 2:27am

I need an algorithm running on the host to generate the same results as one running on the GPU, and I’m getting odd results with the following. I have identical code running on the CPU and the GPU, and the segments and results are:

GPU & CPU results identical:
float x, y;
…
x *= y;
x += .01;

GPU & CPU results differ:
float x, y;
x *= y;
x += .01f

Really, though, I want the additive portion to not be a constant, but a variable. So, I try:

GPU & CPU results differ:
float x, y, z;
z = .01;
x *= y;
x += z;

CPU & CPU results identical:
float x, y, z;
z = .01;
x = fmaf(x, y, z);

So, why not just use fmaf()? Well, it kills the performance on the GPU by more than 25% overall.

So, how can I simulate the built-in FMAD instruction on the host? Any other options I haven’t considered? Thanks!

seibert · May 25, 2008, 1:47pm

I’m not sure what compiler you are using, but the -ffloat-store in gcc might help here. It forces intermediate floating point values to be written back to memory in the normal float representation. The floating point registers on x86 chips have 80 bits of precision, so the CPU can “cheat” compared to the GPU and get a more accurate answer by not chopping off the extra bits between operations.

Skribtsov · May 26, 2008, 4:53pm

if you use Microsoft Visual Studio try playing with compiler option “floating point consistency”.

FullyArticulate · May 26, 2008, 8:01pm

Thanks for the tips! They were helpful.

Topic		Replies	Views
FMA precision issue CUDA Programming and Performance	9	19573	November 21, 2010
"emulating" emulation mode float results? float calculations giving much different results CUDA Programming and Performance	1	2877	January 7, 2009
CPU and CUDA code yield different results? CUDA Programming and Performance	3	1211	June 28, 2013
Floating points CUDA Programming and Performance	3	2133	October 28, 2010
Why does device give wrong answer to simple math? CUDA Programming and Performance	5	2919	November 16, 2011
Why are the calculations different between CPU and GPU? CUDA Programming and Performance	2	934	February 7, 2020
GPU Code and CPU Code output not matching till machine precision (i.e. 13 decimals places) CUDA Programming and Performance	22	1155	August 9, 2023
CPU and GPU floating point calculations Results are different CUDA Programming and Performance	6	22249	August 7, 2010
Prevent ncc from applying MADD optimization precision, IEEE 754 CUDA Programming and Performance	11	11166	April 6, 2009
FMADD non-deterministic? CUDA Programming and Performance	10	3864	March 12, 2009

Emulating FMAD on the host?

GPU & CPU results differ: float x, y; x *= y; x += .01f

CPU & CPU results identical: float x, y, z; z = .01; x = fmaf(x, y, z);

Related topics

GPU & CPU results differ:
float x, y;
x *= y;
x += .01f

CPU & CPU results identical:
float x, y, z;
z = .01;
x = fmaf(x, y, z);