I would like to test my CUDA code and especially compare it to a host equivalent in single precision.
I would like to get as close as possible to what the emulator is doing in terms of “acting like the GPU”.
I have read about setting one of the control floating point register entries to “24-bit computation”.
But I know there are other places where the GPU diverges fronm the ieee754 standard, notably denormalization, etc.
Would anyone have a list of the CPU settings I would need to make to obtain something as close behavior as possible to the GPU ?
ps: I am under windows, hence using the _controlfp(…, …) instruction.