I have a strange and urgent problem with CUDA. I’m writing a CFD simulation program, and I’m using double-precision computations (sm_13). I tested several functions in emulation and “normal” (running on the GPU) mode, and everything worked fine, the values were totally the same, then in the program’s CPU version.
Then comes the problem with the iterative-solver. In emulation mode the program creates exactly the same results, than the CPU version, but as I switch to the GPU, by the 1000th iteration every value overflows, and it has no meaning, as it should work fine.
I have no idea why is this, because even with single-precision it cannot overflow. I searched the code, and if I switch off only one line of the code (a division), then it won’ crash (just the problem is that then I don’t solve the problem).
I have really no idea, what causes the problem, because in emulation mode this division works fine, and in the tested GPU kernels I have even divisions with smaller numbers working 100% precise.
[codebox] GPU_FLOAT Flag3=2.0*(Alpha+(Gamma+Sigma)/3.0);
Has anybody seen something like that???