Does Pascal GPU support atomic operations such as “Atomicadd” in CUDA, in FP16 (half precision)?
AFAIK the capability is not exposed in CUDA 8.
However, for FP16 type packaged into a 32-bit quantity (presumably the most performant arrangement, in some cases) it might be possible to use the custom atomic approach mentioned in the programming guide. I haven’t thought about the ramifications carefully.
If you want it badly enough, it is possible to pack 2 halfs in a float value. Then use a normal float atomic add… then binary edit the cubin to change the instruction from .F32.FTZ.RN to .F16x2.FTZ.RN
It would be nice if nvidia just exposed this mode in ptx now that the half type is fully supported.