Seeking advice on solving a PDE using CUDA

Hi All,

I am new to CUDA. I have looked through the basic tutorials and programming guide and have done some simple tests.

Now I want to use CUDA to accelerate part of my program.

What I want to achieve is to use CUDA to solve a simple PDE like

D(phi)/dt = S(phi)(grad(phi) -1)

I have previously implemented it in C++ and it turns out to be most time consuming because of the large number of grid points involved. This equation has to be time
integrated for 20 steps. That’s 20 loops over grid points on the order of 1 million.

I think CUDA will help since for each time step, each grid point is independent of others.

Could anyone give me some suggestions as to how to implement it? A brief workflow will be good.

The most important issue is how to map data (3D array) to device memory. Shall I use 3D texture? How to set the execution configuration?



The optimal approach will depend a lot of what numerical methods you need to use to solve it.

I am using 5-th order HJ-WENO in space and 3-rd order RK method in time. So that brings up running the same kernel 60 times and up to 3 neighbor grid points need to be available.

I am not clear as to how to map each thread to a grid cell in 3D. In 2D, this is relatively straightforward.