I have a problem, that can be solved easily on a cpu:
A (large) number of particles with different coordinates in two dimensions (x_i,y_i) are weighted on a spacial grid in the x-y-plane. Thus every particle is surrounded by four gridpoints. Depending on where exactly the particle is in this square of four gridpoints, every gridpoint (float) gets some amount of value added (linear weighting). There can be hundreds of particles in one of the cells, so evey gridpoints value after the whole calculation is the sum of all those particles in its cell.
That can be done by running a “for” loop with (i=0)…(i<No_particles) with just every particle in series.
My problem is: When I try to parallelize that, for example every particle gets his own thread, it can happen, that some of the threads try to write on the same gridpoint and i get garbage results. A atomic add for floats could help there i think. A reduction needs much memory, because i would need the whole grid for every thread, and the grid is to large (for example 50*300 points)…
Should I leave the task for the cpu?
Does anyone have any ideas?
Thx for all help!!!