Help Race conditions2

Usually it is better to have separate memory for input and output, as atomic operations are expensive. And for floating point data, atomic operations create the additional nuisance that rounding errors suddenly depend on the specific timing of each execution.