I have 2 kernels computing something and adding their results into the same array in global memory. They both use atomicAdd to do so. Now if those kernels are executed concurrently using 2 streams, the result is quite different (we talk about a magnitude of 10^-2 or 10^-3 here, which is quite high compared to float precision).
The programming guide cleary states “The operation is atomic in the sense that it is guaranteed to be
performed without interference from other threads.”.
Is it guaranteed to work betweek kernels too? I would think so but if not, that could explain the different results I get.
Thanks for your help.