Source-target interaction using atomicAdd

There are many algorithms where system setup is from computing interaction of source target pairs, e.g, n-body problem in general. The nature of the problem may be summarized as follows: There is a space of (discrete) node points. Each node has the potential to impact another node in the space, e.g., contribution(source, target). The result field evaluated at each node is the sum of all potential contributions to this node. I understand there are some smart algorithms which make use of divide-and-conquer approach to reduce complexity. If we would like to do brute force on GPU, it seems that atomicAdd is the way to go to avoid race-condition. Even in brute force, is there a trick to use to speed up the use of atomicAdd in general in such n-body problems, e.g., smart use of shared memory so that lock (repetitively query) is executed on faster memory?

Are you saying that each node influences each other one?
Or that there is a (directed?) graph of node influences?
Or that we have a 3D(?) space with nodes moving in the space and only near nodes influence each other? (Then we would not have a star or atomic nucleus as a node, but its influence on the (gravitational or electrical) field potential propagating to a lot of nodes through their neighbours only)

As a general alternative to atomicAdd you can use a reduction algorithm, which reduces more in a tree-like (exponential) fashion.

With or without atomicAdd you would probably not change values in-place, but have a time step, and the values for each time step are stored in separate memory.

Then you do not have to synchronize reading.

there is a n-body sample code. As mentioned there, there is a corresponding GPU-gems article. The particles sample code may also be of interest.

I viewed Chapter 31. Fast N-Body Simulation with CUDA | NVIDIA Developer. It is indeed a good implementation for n-body. However, our n-body is a little special. The kernel is within a certain range of its neighborhood and it is a 2D problem. So it is not that all interaction pairs should be counted. Not sure if the general implementation for n-body is still a good way to go.