Heat Transfer simulation Computational Physics

For my thesis. I have numerous skill in C programming. For instance cellular automata, spontaneous magnetism, sierpinski triangle, koch curve,

PDE, numerical analysis.

Now, I am tring to exploit the power of GPU which is better than MPI, in my case it will like a “riding an elephant to catch a grass hopper”.

I had read “Cuda by Example”. I appreciate chapter07 so much. Unfortunately, it has no periodic boundary condition.

I do not know how to keep simulation in torus-shaped. Since it has gridDim and blockIdx, but the book does not state exactly what the gridDim is.

I have two examples heat.cu and heat_2D.cu.

heat_2D.cu quite easier to understand, but I have lacking of knowledge in CS and EG.

Here is the code from Nvidia.

__global__ void blend_kernel( float *dst,

                              bool dstOut ) {

    // map from threadIdx/BlockIdx to pixel position

    int x = threadIdx.x + blockIdx.x * blockDim.x;

    int y = threadIdx.y + blockIdx.y * blockDim.y;

    int offset = x + y * blockDim.x * gridDim.x;

    float   t, l, c, r, b;

    if (dstOut) {

        t = tex2D(texIn,x,y-1);

        l = tex2D(texIn,x-1,y);

        c = tex2D(texIn,x,y);

        r = tex2D(texIn,x+1,y);

        b = tex2D(texIn,x,y+1);

    } else {

        t = tex2D(texOut,x,y-1);

        l = tex2D(texOut,x-1,y);

        c = tex2D(texOut,x,y);

        r = tex2D(texOut,x+1,y);

        b = tex2D(texOut,x,y+1);


    dst[offset] = c + SPEED * (t + b + r + l - 4 * c);


Any help will be appreciated.

So The example shown here is the solution to the very basic 2D heat transfer equation (at a glance).

CUDA benefits from having nicely gridded data that can be mapped to linear regions of memory ( so problems that can be expressed in 2D Cartesian coordinates are easiest to translate to the GPU ).

If you need to simulate 2D heat transfer for a toroid you need some way of mapping the spherical equations to something that is Cartesian. There are coordinate transformations that allow you to do this, but you will have to map all of your variables from your spherical space to a 2D Cartesian space.

Thank you for your kind attention.
In this moment I had finished the parallel version for lattice-shaped in CUDA already.
However, my program still have a bottleneck when print out matrix to ppm file in order to manipulate the multimedia representation. I usually use netlibppm
to collect the *ppm file and convert them to single mpeg file, it is sad becasuse of my I/O device is so poor.

If anybody here can give me about display the matrix to monitor directly by graphical library will be great.