run twice, but different results

Hi all,
I have been stuck here for several days. The problem is that I run the following kernel function twice without changing any thing, but get different results.
lapFinalStep1<<<blockGrid, threadBlock>>>(
rect_img,
color_mean,
invcovar,
lap_mat,
d_constarea,
img_w,
img_h);
cutilSafeCall(cudaThreadSynchronize());

double *h_ResultGPU=(double *)malloc(ELEM_SIZE*25*sizeof(double));
cutilSafeCall( cudaMemcpy(h_ResultGPU, lap_mat, ELEM_SIZE*25*sizeof(double), cudaMemcpyDeviceToHost) );
cudaThreadSynchronize();
FILE *fid=fopen("gpuyz2.txt","w+");
for (int j=0;j<640*480*25;j++)
{
	fprintf(fid,"%d %.20lf\n",j,h_ResultGPU[j]);
}
fclose(fid);

Perhaps you’re using some uninitialized variables in your kernel?

N.