atomicAdd(float) does not add very small values

yarospo · November 25, 2013, 12:18pm

When I use float atomicAdd(float *address, float val) to add a float value smaller than approx. 1e-39 to 0, the addition does not work, and the value at address remains 0.

Here is the simplest code:

__device__ float test[6] = {0};
__global__ void testKernel() {
    float addit = sinf(1e-20);
    atomicAdd(&test[0], addit);
    test[1] += addit;
    addit = sinf(1e-37);
    atomicAdd(&test[2], addit);
    test[3] += addit;
    addit = sinf(1e-40);
    atomicAdd(&test[4], addit);
    test[5] += addit;
}

When I run the code above as testKernel<<<1, 1>>>(); and stop with the debugger I see:

test    0x42697800
    [0] 9.9999997e-21
    [1] 9.9999997e-21
    [2] 9.9999999e-38
    [3] 9.9999999e-38
    [4] 0            
    [5] 9.9999461e-41

Notice the difference between test[4] and test[5]. Both did the same thing, yet the simple addition worked, and the atomic one did nothing at all. What am I missing here?

P.S. I’ve done the assignment to addit as sinf(value) to prevent possible compiler optimization on these small numbers.

System info:
Intel i7-3820
32GB of RAM
NVidia Titan

Windows 7x64
CUDA 5.5.20
Driver 331.82
Nsight 3.2.1.13309.

Compiled for CC3.5

seibert · November 25, 2013, 1:45pm

It looks like you are finding that atomic float operations do not support denormalized floating point numbers. The smallest number that can be represented by a “normal” single precision float is ~1e-38, but denormalized (or “subnormal”) floats can go smaller. Devices with compute capability 2.0 and later can handle denormalized numbers, but it appears that the atomic operations (which happen outside the CUDA cores) still do not.

You are operating on the ragged edge of what single precision numbers can handle. I would consider switching to double precision if possible. :)

Sylvain_Collange · November 25, 2013, 2:05pm

Yes, it seems this is the expected behavior, according to the PTX manual:
“atom.add.f32 rounds to nearest even and flushes subnormal inputs and results to sign-preserving zero.”

I have not seen it mentioned in any other document, though. I think it should be mentioned at least in the “Floating-point standard” appendix of the Programming Guide. You may want to file a request for enhancement to update the documentation.

seibert · November 25, 2013, 2:08pm

Ah, good catch! I scanned through the CUDA C Programming Guide and found no explanation, but didn’t think to check the PTX manual. This would be a good thing to explain in the Programming Guide.

cbuchner1 · November 25, 2013, 4:36pm

Would the same restriction apply to shared memory atomics?

njuffa · November 26, 2013, 5:44am

Thanks for alerting us to this issue. I am following up with relevant teams to get the documentation (and in particular the CUDA C Programming Guide) clarified with regard to atomic floating-point adds.

yarospo · November 26, 2013, 12:01pm

Thank you very much!

njuffa · December 5, 2013, 7:11pm

I have filed a couple bugs to get the documentation clarified. As determined by experiments, the current behavior of atomic single-precision floating-point adds is as follows:

Regardless of the setting of the compiler flag -ftz,
[1] Atomic single-precision floating-point adds on global memory always operate in flush-to-zero mode
[2] Atomic single-precision floating-point adds on shared memory always operate with denormal support

Topic		Replies	Views
Atomic float operations. especially add CUDA Programming and Performance	10	16410	July 31, 2009
AtomicAdd result incorrect CUDA Programming and Performance	3	1622	December 29, 2018
atomicAdd not behaving as expected, atomicAdd_system not defined CUDA Programming and Performance	3	1467	September 5, 2022
atomicAdd, atomicExch and atomicCAS give random results CUDA Programming and Performance	1	2584	January 28, 2011
atomicAdd for floating point operations how to specify sm_20 arch CUDA Programming and Performance	7	19911	June 14, 2011
atomicAdds within two loops CUDA Programming and Performance	5	885	October 12, 2021
CUDA dot product atomics problem CUDA Programming and Performance	4	1872	February 26, 2012
atomicAdd and concurrent kernels CUDA Programming and Performance	5	2340	August 6, 2013
Adding a variable inside a buffer from multiple threads CUDA Programming and Performance cuda , kernel	7	351	November 20, 2020
atomicAdd(*float, float) undefined CUDA Programming and Performance	4	3483	February 28, 2012

atomicAdd(float) does not add very small values

Related topics