FDTD simulation on GPU with Alternating Direction Explicit method

Hello,

I am extremely interested in implementing a Finite-Difference Time-Domain simulation on a GPU using the Saul’yev Alternating-Direction Explicit (ADE) method. As discussed on pg. 119 of reference [1], this ADE method is an explicit time domain calculation which starts at the grid boundary and then sweeps the grid from the left to the right (L-R scheme), or from the right to the left (R-L). There are two variables at the n+1 timestep, but since one variable is either at the grid boundary or has been calculated at the previous iteration through the loop, it is easy to calculate for the other variable.

For example, the L-R scheme for the second spatial derivative d2u/dx2 is:

d2u/dx2 = ( u_i-1,j_n+1 - u_i,j_n+1 - u_i,j_n + u_i+1,j_n ) / (deltaX^2)

In the above, d2u/dx2 is the second order partial derivative, deltaX is the grid step, and u_i,j_n is the i,j element of the grid at the n timestep. So in the above equation, the variable that is being solved is u_i,j_n+1, which is the element of the grid at the next n+1 timestep. Suppose that d2u/d2x is known.

However, u_i,j_n+1 and u_i-1,j_n+1 are both at the timestep n+1. But since the scheme sweeps the grid from the left to the right (L-R scheme), the element u_i-1,j_n+1 has already been calculated at the previous loop iteration. So there is only one unknown at the next timestep, which is u_i,j_n+1.

The grid sweeping is implemented by two nested for loops:

for (int i = 0; i < N; i++)

{

   for(int j = 0; j < N; j++)

   {

	   // cell update for u_i,j_n+1 occurs here

   }

}

The boundary condition is assumed to be known at all timesteps, and the cell at the i-1,j position immediately to the left of the i,j cell has been calculated at the previous loop iteration. So I think that the i-1,j cell must be either known or calculated before the i,j cell is calculated.

The question is: can this type of algorithm be adapted for calculation on a GPU? How would I go about doing it, and is there a book/paper/reference that I could use?

REFERENCE:

[1] D.A. Anderson, J.C. Tannehill, and R.H. Pletcher, Computational Fluid Mechanics and Heat Transfer, Washington: Hemisphere Pub. Corp, 1984.

Would it be reasonable to solve for both u_i,j_n+1 and u_i-1,j_n+1 at timestep n+1 using a system of equations? Thus for points u_i,j_n+1 and u_i-1,j_n+1, there is a system of two equations, and two unknowns.

Does this sound reasonable, and so what I would do is allocate the work into blocks?

Would it be reasonable to solve for both u_i,j_n+1 and u_i-1,j_n+1 at timestep n+1 using a system of equations? Thus for points u_i,j_n+1 and u_i-1,j_n+1, there is a system of two equations, and two unknowns.

Does this sound reasonable, and so what I would do is allocate the work into blocks?

It seems like this should be possible using a thread-per row, and depending on the size of your grid you might be able to cache the row data in shared memory.

This presentation describes something similar, using the ADI method to solve the heat equation for a depth of field effect:
[url=“http://developer.download.nvidia.com/presentations/2010/gdc/metro.pdf”]http://developer.download.nvidia.com/prese...0/gdc/metro.pdf[/url]

It seems like this should be possible using a thread-per row, and depending on the size of your grid you might be able to cache the row data in shared memory.

This presentation describes something similar, using the ADI method to solve the heat equation for a depth of field effect:
[url=“http://developer.download.nvidia.com/presentations/2010/gdc/metro.pdf”]http://developer.download.nvidia.com/prese...0/gdc/metro.pdf[/url]

Thank you for your response, Simon! :D That is a very exciting presentation, and I think that using a thread per row is reasonable. Once again, thank you!

Thank you for your response, Simon! :D That is a very exciting presentation, and I think that using a thread per row is reasonable. Once again, thank you!