Is there any indication if there will be 16-bit atomic operations (preferably an atomicAdd()) on either the ‘half’ type as a float point value or as a 16-bit integer) ? This would be for shared or global memory (hopefully both but I will be happy with either possibility).
Made my own 16-bit unsigned int atomicAdd() hack for shared memory which I am currently using for a real-time image reconstruction, but it is not as efficient as a 32-bit atomic operation.
Even if it is not hardware supported I would guess that a NVIDIA version of a 16-bit atomicAdd() would be better than my ‘rolled-my-own’ version.
Heard rumors that this may be available in some inefficient form for the GTX line, but cannot find any documentation.
Maybe the final release of CUDA 8?