I would like to ask, regarding the following program, can each d_sigma successfully read the old value of d_A before the atomic operation updates it? Or will there be random and uncertain results?
int i = blockIdx.x * blockDim.x + threadIdx.x; float d_sigma[i]=d_A[i]; atomicAdd(&d_A[i], 10.0f); atomicAdd(&d_A[i-1], 10.0f); atomicAdd(&d_A[i+1], 10.0f);
cuda provides no guarantees about the order of thread execution, except those that you explicitly write in code. You haven’t included any here.
Since each thread, therefore, is writing to 3 separate locations, two of which are also written and read by adjacent threads, there is no reason to assume any ordering of things. You cannot say whether a given location d_A[i] will be read before any atomic updates, or which atomic updates will be applied, when it is read.