I have filed a couple bugs to get the documentation clarified. As determined by experiments, the current behavior of atomic single-precision floating-point adds is as follows:
Regardless of the setting of the compiler flag -ftz,
[1] Atomic single-precision floating-point adds on global memory always operate in flush-to-zero mode
[2] Atomic single-precision floating-point adds on shared memory always operate with denormal support