I’m seeing some minor differences in single precision floating-point addition when using sm_13 or sm_20. I’m using nvcc 3.0 on Linux and the resulting ptx code is identical. Here’s a simple test app:
When I run this using sm_20 (as above), I get this:
result = 3506176
However, when running using sm_13, I get this:
result = 0
I’m aware there are minor differences in single precision floating point calculations for division, square root, etc, but I expected these simple additions would adhere to the 754 spec in the same way.
Can anyone either explain why this is the case or point me to a relevant article or document explaining this?
Please note that type-punning via union invokes undefined behavior according to the C standard. I would recommend to reinterpret floats as ints, and ints as floats, via CUDA’s __int_as_float() and __float_as_int() functions. The resulting code in this case would look like:
int a = -2020276922;
int b = 127206940;
result[0] = __float_as_int(__int_as_float(a) + __int_as_float(b));
Alternatively, the reinterpret_cast known from C++ can be used.
One significant difference between sm_1x GPUs and sm_2x GPUs is that the latter have support for single-precision denormals. On an sm_1x GPU, any single-precision result of magnitude less than 2^-126 (1.175494351e-38) is converted to a zero with like sign. This is called the “flush to zero” response or FTZ for short. An sm_2x GPU gives the user a choice between flush to zero behavior or denormal support for single-precision arithmetic. The compiler enables denormal support by default when targetting sm_2x GPUs. The user can override this default choice with the --ftz flag.
Here, a reinterpreted as a float has the value -2.242212771e-34, and b reinterpreted as a float has the value 2.242261903e-34. The sum of these two floats is 4.913199044e-39, corresponding to 3506176 when reinterpreted as an int. The result is a denormal. Therefore, we would expect the following result:
for sm_1x: 0 // due to flush-to-zero
for sm_2x with --ftz=true: 0 // due to flush-to-zero
for sm_2x with --ftz=false: 3506176 // denormal support enabled (compiler default)
Thanks for the comprehensive answer, that was exactly the information I was after. I wasn’t aware that it was only the recent Fermi generation GPUs that support single precision denorms.