Stopping criteria for Jacobi Iteration

I have written a code for finite difference scheme using Jacobi iteration. The code works, but i am not sure how can i insert a stopping criteria that terminates the iteration when the error is below a threshold value. Below are excerpts of my code:

Host Code:
for(i=1;i<=21777;i++)
{
JacobiIteration<<<dimGrid, dimBlock>>>(du,nSize);
}

Kernel:
global void JacobiIteration(float* u,int n)
{
int x=threadIdx.x+blockIdx.xBLOCK_SIZE;
int y=threadIdx.y+blockIdx.y
BLOCK_SIZE;

if(x>0&&x<n-1&&y>0&&y<n-1)
{
u[y+xn]=0.25(u[y+(x+1)n]+u[(x-1)n+y]+u[(xn)+y+1]+u[(xn)+y-1]);
}
}

Thanks in advanced.

For an iterative Jacobi solver, you usually want to use something like the L-infinity norm of the Laplacian as a termination condition, so that when the norm is sufficiently near zero, you stop. The parallel reduction example in the SDK can be adapted to compute those sort of norms fairly simply. So what you would do is have the host side loop periodically launch the norm kernel and check the result. If you want to be clever, you can store the results of the previous norm calculations and track the convergence rate. That can be used to predict when the next norm launch should be performed and when the iteration should stop. It can also be used as a diagnostic to flag slow convergence and prevent endless or overlong calculations which would never converge to the specified tolerance.

Thanks a lot! I never thought of putting another kernel in the loop…thanks again!

Instead of terminating the loop in host code, i try to terminate the loop in the kernel instead:

global void JacobiIteration(float* u,int n)
{
int x=threadIdx.x+blockIdx.xBLOCK_SIZE;
int y=threadIdx.y+blockIdx.y
BLOCK_SIZE;
int i;

for(i=1;i<=21777;i++)
{
if(x>0&&x<n-1&&y>0&&y<n-1)
{
u[y+xn]=0.25(u[y+(x+1)n]+u[(x-1)n+y]+u[(xn)+y+1]+u[(xn)+y-1]);

}
}
}

However, the answer i obtained now is different. Why is that the case?

In a Jacobi solver, you are relying on characteristics to propagate or diffuse across the grid until the whole grid “relaxes” to a steady state solution. Your second kernel (combined with the CUDA execution model) doesn’t allow that to happen, because you have completely removed the implicit synchronisation that using a host side loop provided. In fact, even your first kernel has some potential problems with read after write coherence. Ideally, you should not update the solution in-place for that reason.