Reduction Problem Need help with reduction type problem...


I am trying to find a solution to a reduction type problem. Unfortunately, it is different from the sample scenario, so I am at a loss for how to approach it. The problem can be summed up with this code:

for (unsigned int i = 0; i < INTERACTIONS; i++) {

    sumforces[id[i].x] += force[i];

    sumforces[id[i].y] -= force[i];


The code sums up accumulated forces for pairs of interacting particles (its for Smoothed Particle Hydrodynamics). There is one sumforces entry for each particle, and one force entry for each pair interaction, and id is the indicies of the two interacting particles.

Some specifics can be stated about the scenario:

  • The force is calculated on the GPU

  • There are approx 100K particles

  • There are approx 2-3 million interactions

  • force/sumforces is float4’s

  • id.x is contiguous (000112333)

  • id.y is all over the place

Is there some way to do this step on the GPU?

Thank You,


My solution (for molecular dynamics, not SPH) is to forget about trying to calculate each force only once. I assign one thread per particle i and sum all force pairs with particle i in the pair in that thread and write the summed force at the end.

This means that each pair force is calculated twice, but the device has GFLops to spare, and the calculation is memory access bound any ways. If you do the math, you find that the +=, -= type reduction approach involves more memory accesses.

But there are still many technical barriers to using the += and -= setup that prevent its use even if it were desireable.

  1. how do you coalesce writes to id.y?
  2. the device is incapable of atomic read-modify-writes on floats so race conditions will make your results incorrect.