CUDA "emulating emulation mode" What changes to the fp control register?


I would like to test my CUDA code and especially compare it to a host equivalent in single precision.

I would like to get as close as possible to what the emulator is doing in terms of “acting like the GPU”.

I have read about setting one of the control floating point register entries to “24-bit computation”.

But I know there are other places where the GPU diverges fronm the ieee754 standard, notably denormalization, etc.

Would anyone have a list of the CPU settings I would need to make to obtain something as close behavior as possible to the GPU ?

thank you.

ps: I am under windows, hence using the _controlfp(…, …) instruction.

Umm, i don’t think it does much. It differs from the GPU in many important respects, and I doubt it cares about unimportant ones like rounding.

In any case, just start debugging in Visual Studio, breakpoint inside the kernel, go to Debug>Windows>Registers, right-click inside the new windows, and check Floating Point. That’ll show you what the state is.

Let us know what you find.

To get the arithmetic behavior that is the closest to the GPU (assuming you’re only using single precision), try compiling your code with SSE and running it with FTZ and DAZ modes enabled.

It will reproduce the behavior of computing in single precision and flushing denormals to zero, but not the truncation inside the MAD.

See for example the /arch:SSE option in Visual C++ and (not tested).

Thank you, I am going to give that a try.

@Alex: Running everything inside VC++ is not possible in my case sadly.

You can use any one of the SDK samples. Your question was “what does the emulator do to the FP state registers” and you don’t need to run your full-fledged app to answer that. If you find out the answer, btw, please share it.