Double Precision errors

arun_suthar · June 7, 2018, 4:29pm

Hi, I am Arun. I am working with cuda fortran. In one of my code i am using double precision variables of type real(8), but unfortunately the results computed by CPU and the GPU are not exactly the same. Why does double precision computation differs in GPUs from the CPUs. Is there any solution to this?

MatColgrove · June 7, 2018, 4:36pm

Hi Arun,

There could be several reasons. For example, running in parallel can cause ordering difference which in turn causes rounding differences.

Also, what CPU are you using? The GPU uses FMAs by default so if you’re CPU is not using FMAs, this can cause differences. Try adding “-Mcuda=nofma” to your compile flag to disable FMA on the GPU.

Also try adding “-Kieee” to have the compiler enforce IEEE754. Though, precision differences due to parallel operations and FMA wont be effected by this flag.

-Mat

arun_suthar · June 8, 2018, 11:21pm

Hi Mat.
Regarding CPU, my laptop has intel core i5 processor.
Intel(R) Core™ i5-4210U CPU @ 1.70GHz.

The GPU insatalled is : GeForce 820M with compute capability 2.1.

Regarding computation difference with double precision, for example a function does some computations and writes different results to four 2D arrays. In one of the arrays the maximum value of the error between the cpu and gpu arrays is of the order of 10^(-14) but the rest of arrays show absolute zero error.

I think my CPU does use FMAs because when I disabled the FMAs using -Mcuda=nofma the results were worse, all the arrays had some finite error of the order of 10^(-8).

Adding Kieee also worsened the results. I checked the function algorithm has no issues, it is correct. What could be the problem? Any other suggestion?

MatColgrove · June 11, 2018, 3:19pm

Intel(R) Core™ i5-4210U CPU @ 1.70GHz.

This is a haswell architecture so does have FMA.

Adding Kieee also worsened the results

This changed your CPU results? -Kieee disables optimization that would cause precision differences so should be more accurate. Hence my guess is that your algorithm is numerically sensitive. Are you doing any accumulation such as summation? Are there any uninitialized variables?

Can you post or send to PGI customer service (trs@pgroup.com) the code so I can take a look?

-Mat

arun_suthar · June 11, 2018, 11:03pm

Hi. I have mailed you the code on the given email address. Please find the attachment with the mail. The code is a shortened version of a very lengthy program and contains 2 functions a serial and a parallel and their results are compared.

I have explained as much as possible about the program in the mail and comments in the code.

You can see that on execution even using fma’s there is still a loss of accuracy.

Well adding -Kieee in this part does not affect the solution as you stated. I did some mistake in concluding that -Kieee worsened the solution, I am sorry for that.

MatColgrove · June 12, 2018, 3:00pm

Hi Arun,

Thanks for the example. It looks like it is an FMA issue but just opposite of what I guessed. Here FMA is being generated for the host code but not the device. Hence, you can add “-Mnofma” to disable all FMA code generation:

% pgf90 -Mcuda precision.cuf general.cuf -Mnofma ; a.out
precision.cuf:
general.cuf:
Device name:Tesla K80
Compute capability : 3.7
 errf in zm(surfzmgradz_cudaf)     =     0.000000000000000
 errf in surf(surfzmgradz_cudaf)   =     0.000000000000000
 errf in gradz(surfzmgradz_cudaf)  =     0.000000000000000
 errf in gradz2(surfzmgradz_cudaf) =     0.000000000000000

-Mat

Topic		Replies	Views
Is there a difference between GPU double precision and CPU double precision? CUDA Programming and Performance	14	11172	November 26, 2009
Difference in double precision results? Legacy PGI Compilers	1	2014	August 9, 2010
FMA precision issue CUDA Programming and Performance	9	19529	November 21, 2010
GPU Code and CPU Code output not matching till machine precision (i.e. 13 decimals places) CUDA Programming and Performance	22	1092	August 9, 2023
CPU and CUDA code yield different results? CUDA Programming and Performance	3	1186	June 28, 2013
Differences between GPU and CPU results with -nofma Legacy PGI Compilers	1	2397	November 4, 2011
About GPU calculation accuracy Legacy PGI Compilers	4	577	August 18, 2020
Precision Fail CUDA Programming and Performance	5	10581	March 10, 2011
Float precision error in matrix multiplication application. CUDA Programming and Performance	14	3811	February 27, 2014
Slight difference in CPU-serial and GPU-parallel solutions nvc, nvc++ and nvfortran	2	692	October 1, 2021

Double Precision errors

Related topics