I run into a strange case where I use GeForce 8800 GT to perform inner product of two vectors. If I add a printf() right before I use the results, it works fine. But if I don’t add printf(), the result is wrong.
float xTy (float *x, float *y, int n)
float gpu = GPU_inner (float *x, float *y, int n);
printf ("gpu = %g\n, gpu); // remove this line will make // value of next f(gpu) wrong f (gpu) ... // gpu value is correct only if above printf is present
First I suspect I didn’t synchronize the return data from GPU, then I realize the gpu value has been saved, before the printf statment. I also tried to substitute the printf statement with some busy loop, but no matter how much delay I add in between, it won’t make any difference. The only difference is a printf statement, which I suspect has to do with the graphics card.