Difference in double precision results?

Hi everyone,

I have a question about double precision calculations and differences between the gpu and cpu. The things is, when I run a program in both the gpu and cpu I received the following results:


30 1.5372E-02 1.5372E-02
35 1.5441E-02 1.5441E-02
40 6.5295E-02 6.5295E-02
45 1.5270E-02 1.5270E-02
50 1.5169E-02 1.5169E-02
55 1.5003E-02 1.5003E-02
60 1.5029E-02 1.5029E-02
65 1.5196E-02 1.5196E-02


30 1.5377E-02 1.5377E-02
35 1.6565E-02 1.6565E-02
40 1.5115E-02 1.5115E-02
45 1.5050E-02 1.5050E-02
50 1.5396E-02 1.5396E-02
55 1.5078E-02 1.5078E-02
60 1.5012E-02 1.5012E-02
65 1.5310E-02 1.5310E-02

As you can see, there are discrepancies in the results even though I had the following flags: -r8 -i8 and -03. Does anyone know why this might be the case?

Thank you for your time.



Hi Chris,

Some accuracy differences are expected especially if your program uses trigonometric intrinsics (such as sin, cos, etc.) or imprecise divides (-Mcuda=fastmath). Though, it’s really program dependent whether the results are ‘close enough’ or ‘wrong’. If the answers are ‘wrong’, you’ll need to investigate why.

First, are you using CUDA Fortran or the PGI Accelerator Model directives?
What are the differences between you CPU and GPU versions of the code?
How does vectorization (-Mvect=sse) and parallelization (-Mconcur or OpenMP directives) effect the accuracy of your CPU version?
Are there any reductions?

Note, when I’ve investigated wrong answers, I will typically use ‘check-point’ arrays to store intermediary values and then compare the GPU values with the CPU values. Eventually, you’re able to identify where the divergence occurs and hopefully are able to identify why.

Hope this helps,