Hello,

I’ve been working on a CUDA implementation of a particle in cell algorithm, and there is one part which takes a long time, but which I haven’t been able to put on the gpu device yet. Here is the problem:

There is a system of N particles on a grid G of grid size h. These particles interact, and I want to calculate their trajectories. This problem is of course handled in one way, the brute force way, in the Nbody example. This is nice for small N, but for very large N (>100k) this method is exceedingly slow. So instead of calculating N^2 interactions, you calculate the particle charge to the grid points, solve the Poisson equation on the system for potential, and interpolate the grid potentials to the particles to get the forces. Very well, for large N, it does look like this is faster now. However, a lot of time is still spent doing the part I haven’t put in parallel, weighting the particle charge to the grid points.

The problem to put it in parallel is that if each particle is a thread, then they may have to read and write to the same grid locations multiple times. If each grid point is a thread, then it seems you still have to loop through each particle to find which ones are close, and save no time.

I thought I’d do this maybe with atomicAdd(&grid[index],weight), but it says this is only supported for integers, and the particle charge weighting needs to be a float. The atomicExchange function maybe could work, but still you need to read that grid location’s current charge to know how to update it, and I thought that would be a problem.

Has anyone seen a solution to this problem, or have an idea of how to approach it?

Thanks for your help