Hello everybody! I am fairly new to CUDA and I was writing a program to solve a heat equation using a simple Euler Method. I wrote the function in C++ and it seems to work fine but when I tried to implement the kernel the results are slightly different. e.g:
GPU CPU
1 302.0542908 302.0537720
2 -15.1334448 -15.1327314
3 -15.0020161 -15.0026360
…
The difference is usually greater when I increment the number of time steps for the Euler method. Here’s both the kernel and the function.
[codebox]void fwdEuler(float* u, const float dt, const float dx)
{
// Function for taking one Forward-Euler time-step. For all
// interior grid points, it replaces U[j] with U[j] at the next time
// step. Values at the boundaries j=0 and j=nj are unchanged. The
// array next_u is necessary for temporary storage.
const float nu = dt / (dx * dx);
float* next_u;
next_u = new float[N + 1];
for (int j = 1; j < N; j++)
next_u[j] = u[j] + nu * (u[j - 1] - 2. * u[j] + u[j + 1]);
for (int j = 1; j < N; j++)
u[j] = next_u[j];
}[/codebox]
When I call the function in the main:
[codebox]for (int time_step = 0; time_step < nsteps; time_step++)
fwdEuler(h_b, dt, dx);
}[/codebox]
Here’s the kernel:
[codebox]global void fwdEulerKernel(float u, float next_u, const float dx,
const float dt, int nsteps, int N)
{
int idx = blockDim.x * blockIdx.x + threadIdx.x;
float nu = dt / (dx * dx);
// Skip the first element in u
if(blockIdx.x < 1)
idx +=1;
for (int i = 0; i < nsteps; i++)
{
if(idx < N)
{
next_u[idx] = u[idx] + nu * (u[idx - 1] - 2. * u[idx] + u[idx + 1]);
u[idx] = next_u[idx];
}// end if
}// end for i
}[/codebox]
Any suggestions??? Thanks!