Emulating FMAD on the host?

I need an algorithm running on the host to generate the same results as one running on the GPU, and I’m getting odd results with the following. I have identical code running on the CPU and the GPU, and the segments and results are:

GPU & CPU results identical:
float x, y;

x *= y;
x += .01;

GPU & CPU results differ:
float x, y;
x *= y;
x += .01f

Really, though, I want the additive portion to not be a constant, but a variable. So, I try:

GPU & CPU results differ:
float x, y, z;
z = .01;
x *= y;
x += z;

CPU & CPU results identical:
float x, y, z;
z = .01;
x = fmaf(x, y, z);

So, why not just use fmaf()? Well, it kills the performance on the GPU by more than 25% overall.

So, how can I simulate the built-in FMAD instruction on the host? Any other options I haven’t considered? Thanks!

I’m not sure what compiler you are using, but the -ffloat-store in gcc might help here. It forces intermediate floating point values to be written back to memory in the normal float representation. The floating point registers on x86 chips have 80 bits of precision, so the CPU can “cheat” compared to the GPU and get a more accurate answer by not chopping off the extra bits between operations.

if you use Microsoft Visual Studio try playing with compiler option “floating point consistency”.

Thanks for the tips! They were helpful.