AtomicAdd() with zero value

Noticed that if you have a variable (float in my case) which is exactly zero, and call atomicAdd(&addr[index],var), that it will access the memory location even though the end result of the memory update operation will be no change.

Does this sound right?

I suppose this is useful if you want to get the current value (since atomic operations have a return value), but in situations where you just want to update and are not using the return value for any reason this is a waste of time.

If I checked the value for non-zero status before I invoked the atomicAdd(), and avoided the memory access (if zero) it resulted in a modest performance increase. This probably was due to the input set having about 30% zero values, so that checking only is useful in such situations.

Wonder if it might make sense to write a modified atomicAdd() which did not access the memory if the input value is either zero or negative? Anybody try such a custom implementation?

Any downsides to that approach rather than checking for non-zero before the atomicAdd() call?

The checking for zero is not atomic, unless you add some extra locking mechanism around it.

There is a race condition lingering here: i.e. the value could become non-zero in the time between the if check and the atomicAdd(), yet you’d already be in the code path that assumes that it’s zero.

With floating-point, there are a few corner cases in which you can actually change a value by adding zero to it. For instance, a negative zero can be turned into a positive zero. Also, floating-point atomic operations do not support subnormals, so an existing subnormal number will be flushed to zero by the operation.

If this behavior does not matter for your application and you have many zero-adding atomicAdds, you can indeed write a version of atomicAdd incorporating a check. However, this is not something the compiler or hardware could do automatically because of floating-point standard compliance.

I was talking about preventing the atomicAdd() if the thread-local value turns out to be zero.

Wanted to prevent the global operation if the scalar input value to be added to the current global sum was either zero or within some ‘bad and do not waste time on a global update because it will not make a difference’ range.