Fastest way to iterate over a 3-D grid for heat transfer

Hi,
I’ve a 3-D rectangular grid and I want to iterate over the grid working out the heat flows between all of the elements. I’ve currently got this as a grid and we using a Dim3 to iterate over each member and I calculate the heat loss between the six nearest neighbours and the central point. The downside of this is that I calculate the heat flow for each element twice and it would be great if a single block’s threads handled all of the grid positions near those of the other grid positions to limit the amount of fetching to each block. There’s also a lot of if statements to handle the edge conditions.

I’ve tried to work on a new system where each thread iterates along one direction in the block calculating the heat flows in the direction, with heat moving to grid location n+1 being equal to heat leaving grid location n. I then repeat this for each of the three directions.

In both cases I double-buffer the values so I write the values to a new array and then swap the pointers back at the end of the timestep.

I think this has got to be a fairly well studied problem on a GPU so I wanted to know if anyone else knew of the canonical way to solve this problem, running as fast as possible?

Thanks,

The described algorithm should be similar to a 3D jacobi iteration. Maybe you can find some optimized gpu implementations

Is the problem purely memory bandwidth bound or also compute bound?

What is the formula to combine the center element and the 6 nearest neighbours?

As the maximum distance is so small (direct neighbours only) you could calculate e,g, 6*6*3 or even slightly larger blocks within a single thread (having thus much data loaded in registers!) shifting along the dimension with 3 loaded elements.

(You could probably even calculate the heat transfer with more neighbours and weighting without loosing iteration speed, but by using the gained precision being able to reduce spatial and/or time resolution.)