For my thesis. I have numerous skill in C programming. For instance cellular automata, spontaneous magnetism, sierpinski triangle, koch curve,

PDE, numerical analysis.

Now, I am tring to exploit the power of GPU which is better than MPI, in my case it will like a “riding an elephant to catch a grass hopper”.

I had read “Cuda by Example”. I appreciate chapter07 so much. Unfortunately, it has no periodic boundary condition.

I do not know how to keep simulation in torus-shaped. Since it has gridDim and blockIdx, but the book does not state exactly what the gridDim is.

I have two examples heat.cu and heat_2D.cu.

heat_2D.cu quite easier to understand, but I have lacking of knowledge in CS and EG.

Here is the code from Nvidia.

```
__global__ void blend_kernel( float *dst,
bool dstOut ) {
// map from threadIdx/BlockIdx to pixel position
int x = threadIdx.x + blockIdx.x * blockDim.x;
int y = threadIdx.y + blockIdx.y * blockDim.y;
int offset = x + y * blockDim.x * gridDim.x;
float t, l, c, r, b;
if (dstOut) {
t = tex2D(texIn,x,y-1);
l = tex2D(texIn,x-1,y);
c = tex2D(texIn,x,y);
r = tex2D(texIn,x+1,y);
b = tex2D(texIn,x,y+1);
} else {
t = tex2D(texOut,x,y-1);
l = tex2D(texOut,x-1,y);
c = tex2D(texOut,x,y);
r = tex2D(texOut,x+1,y);
b = tex2D(texOut,x,y+1);
}
dst[offset] = c + SPEED * (t + b + r + l - 4 * c);
}
```

Any help will be appreciated.