Differences in single precision floating-point addition between sm_13 and sm_20?

I’m seeing some minor differences in single precision floating-point addition when using sm_13 or sm_20. I’m using nvcc 3.0 on Linux and the resulting ptx code is identical. Here’s a simple test app:

// kernel.cu:

union u32


  float f;

  int u;


__global__ void singlePrecisionTest(int* result)


	u32 a;

	a.u = -2020276922;


	u32 b;

	b.u = 127206940;


	u32 c;

	c.f = a.f + b.f;


	result[0] = c.u;


extern "C"

void singlePrecisionTest()


	int h_result;

	int* d_result;


	cudaMalloc((void**) &d_result, sizeof(int));


	singlePrecisionTest<<<1, 1>>>(d_result);


	cudaMemcpy(&h_result, d_result, sizeof(int), cudaMemcpyDeviceToHost);


	printf("result = %d\n", h_result);



// main.cpp:

extern "C"

void singlePrecisionTest();

int main(int argc, char** argv)




	return 0;


Compiled like this:

nvcc -arch sm_20 -c -o kernel.cu.o kernel.cu

g++433 -c -o main.cpp.o main.cpp

g++433 -o app -Lcuda-3.0/lib64 -lcudart main.cpp.o kernel.cu.o

When I run this using sm_20 (as above), I get this:

result = 3506176

However, when running using sm_13, I get this:

result = 0

I’m aware there are minor differences in single precision floating point calculations for division, square root, etc, but I expected these simple additions would adhere to the 754 spec in the same way.

Can anyone either explain why this is the case or point me to a relevant article or document explaining this?



Please note that type-punning via union invokes undefined behavior according to the C standard. I would recommend to reinterpret floats as ints, and ints as floats, via CUDA’s __int_as_float() and __float_as_int() functions. The resulting code in this case would look like:

int a = -2020276922;                

int b = 127206940;                

result[0] = __float_as_int(__int_as_float(a) + __int_as_float(b));

Alternatively, the reinterpret_cast known from C++ can be used.

One significant difference between sm_1x GPUs and sm_2x GPUs is that the latter have support for single-precision denormals. On an sm_1x GPU, any single-precision result of magnitude less than 2^-126 (1.175494351e-38) is converted to a zero with like sign. This is called the “flush to zero” response or FTZ for short. An sm_2x GPU gives the user a choice between flush to zero behavior or denormal support for single-precision arithmetic. The compiler enables denormal support by default when targetting sm_2x GPUs. The user can override this default choice with the --ftz flag.

Here, a reinterpreted as a float has the value -2.242212771e-34, and b reinterpreted as a float has the value 2.242261903e-34. The sum of these two floats is 4.913199044e-39, corresponding to 3506176 when reinterpreted as an int. The result is a denormal. Therefore, we would expect the following result:

for sm_1x: 0 // due to flush-to-zero

for sm_2x with --ftz=true: 0 // due to flush-to-zero

for sm_2x with --ftz=false: 3506176 // denormal support enabled (compiler default)

Thanks for the comprehensive answer, that was exactly the information I was after. I wasn’t aware that it was only the recent Fermi generation GPUs that support single precision denorms.