I am working on a program which needs to perform an atomic floating-point add from each thread. I noticed there is no atomicAdd for floats, so I used the following:
atomicExch((int *)&globaldata[ishift], __float_as_int(globaldata[ishift]+newval));
where float globaldata is a global memory array, float newval is the value to be added.
I run this and it seems working fine. but there is about 35% efficiency drop comparing with non-atomic memory write (i.e. globaldata[ishift]+=newval).
it looks to me the above code did global-memory read twice: one when calling atomicExch, one when calling globaldata[ishift] for the addition. I don’t know if this is responsible to the speed drop (I know atomic calls are also expensive).
I have two questions:
does the above statement for atomic-float-add make sense to you?
is there an atomic function just perform a single global-memory write? if there is, I can perhaps save the memory read associated with atomicExch.