Atomic Float/Double operations on Fermi?

Are atomic float/double operations supported on Fermi? AFAICT, only signed and unsigned integers are supported. Is this correct?



I don’t think anyone outside of NVIDIA knows at this stage. I would be incredibly surprised if it were supported. That would probably involve fabbing a complete double precision capable fpu in the memory controller, which seems an unlikely proposition at best…

I don’t know any CPU that has floating point atomic adds, much less any other operations like mults or transcendentals.

There’s also the problem of commutativity… With floating point, (a+B)+c is not the same as a+(b+c). So floating point atomics will be order dependent, even if they existed, ruining one of the abstractions of atomic operations.

In practice, fixed-point is an easy workaround to most float-atomic needs.

CUDA does support floating point atomic exchanges now, of course, but that’s likely not what you’re asking about.

I believe that you can do floating point addition atomically on a GPU - you just have to use OpenGL and frame buffer blending. I found this out when accelerating a C code, where portions of an earlier version had been done on a GPU by someone who knew OpenGL. In CUDA, I had to change the algorithm dramatically in order to avoid the need for atomic adds.

Digging through the header files in the cuda 3.0 beta toolkit revealed that Fermi does in fact support atomic float adds, at least after a fashion. Actually, the implementation in the header is if anything even more interesting, indicating that it’s a software based atomic built on top of a hardware mutex.

As for the lack of commutativity in floating point addition, this really shouldn’t be a problem for most sane cases. After all, you’re talking about an error on the order of log(numAdditions) ULP. Moreover, chaos theory dictates that in general error in a dynamic system either tends to increase at an exponential rate, or else decrease at an exponential rate. Moreover, for any continuous function, there exists a point within a radius epsilon of the “true” point that remains within any radius delta of the point with error after an arbitrary number of iterations. This means that even in the case when error increases with each iteration, your erroneous solution will still be very close to a perfect solution based from slightly different starting conditions.