noticeable numerical error

I have this simple test code

  REAL(KIND=dp) :: Ca_myo, Ca_nsr
  REAL(KIND=dp) :: old, dtmp
  INTEGER :: hidx, NSFU

  NSFU = 100  ! 1, 100, 1000, 20000
  Ca_myo = 0.1d0
  Ca_nsr = 1.d0

  ALLOCATE(Ca_ds(NSFU), Ca_jsr(NSFU))
  Ca_ds = Ca_myo
  Ca_jsr = Ca_nsr
  old = (Ca_myo+Ca_nsr) * (NSFU) 

  dtmp = 0.d0
  DO hidx = 1, NSFU
     dtmp = dtmp + Ca_ds(hidx) + Ca_jsr(hidx)
  PRINT *, "old", old, "new", dtmp
  PRINT *, "the difference", (old-dtmp)/dtmp, ABS(old-dtmp)


Basically, there is no numerical error if I use NSFU=1. However, when I increase NSFU, significant difference is observed
NSFU = 10000, then diff = -1.8586764450774207E-013

This kind of error become larger, in my simulation system, which make the species in a closed system not conserved. Is this a normal (acceptable) error or is there a way to enhance the accuracy of the system?

I’m using X86_64bit system.


I’m not a master at numerical computing, but I’m sure some people at PGI are so believe them more than me if/when they chime in.

I think what you are seeing there is just accumulation of roundoff error. IIRC, every arithmetical operation will introduce an error of machine epsilon, e_m, which for DP is around 1e-16 (I think, probably got that wrong) and the best you can ever do with N arithmetical operations is around sqrt(N)e_m which is assuming the errors are random. So, for 10000 operations, your cumulative error in the best case is ~1e-14. (For some algorithms, the cumulative error is more like Ne_m which is ~1e-12 when N=10000.)

To get better than this, I think you’d have to move to quad precision or even to something like ARPREC or FMLIB which do arbitrary-precision arithmetic.

Thank TheMatt. I guess the error is acceptable or unavoidable with DP.


Hi Tuan,

Matt’s analysis is correct. The only thing I’ll add is a bit about binary representation of floating point numbers. When I taught Intro to Computer Science back in Grad school, I’d usually take a full lecture explaining it. I’ll limit myself here.

“0.1” can’t be represented exactly in floating point representation. Hence, you’re using an approximate value. The more bits used in the floating point representation (i.e the precision Single, Double, Quad) the closer the approximation but it’s still ‘wrong’. The more operation performed on the value, the larger the amount of error.

The all knowing Wikipedia has a good write-up on Binary that may be useful:

  • Mat

Hi Tuan,

I am absolutely new to PGI Fortran; I just installed the trial version (32 bit) but have not had a chance to use it, so I cannot speak from first hand experience of what the results will be for what I am going to suggest.

You may be interested to see if the results you get are different if you replace every instance of “d0” with “_dp”. I don’t know if PGI builder would select a different KIND type parameter for REAL based on your definition of “dp” versus what it would select for “default” double precision (“d0”).

Also, the interval between machine representable numbers increases the larger the ‘target’ number gets, that is the larger the numerical value the larger the “spacing” between machine representable numbers (however the relative spacing is maximum at 1). Therefore, you may be interested in what the result is of evaluating SPACING(old) and SPACING(dtmp). SPACING(1_dp) should give you machine epsilon based on the PGI-builder chosen model for “dp”. SPACING is the absolute spacing of model numbers near the argument. Perhaps even more informative would be the result of evalutating RRSPACING on “old” and “dtmp” (reciprocal of relative spacing of model numbers near argument).

-Brian L.

Thanks all for the helpful information. I’m just curious if there is a good way to reduce this kind of error.


Hi Tuan,

There is no way to work around the error, though adding more precision does give a better approximation which does reduce it. In current processors 64-bit is the largest data type supported in the hardware. Well, this isn’t entirely true. Most processors still support 80-bit precision but only when using the x87 floating point unit which we only support in 32-bits using the “-tp piii” target processor flag. If you’re curious read this old FAQ about 80-bit precision:

Given that current SSE uses 64-bit data types, to use extended precision you need to use software based approaches (like the ones Matt mentions) which can use arbitrary precision. The caveat being that a software approach will be much slower than using hardware.

If precision is the main requirement of your work, then please use the software based approaches. If performance is more important, then you will need to give-up some accuracy.

  • Mat

Thanks a lot Mat.


If precision really matters,there is also this trick:

In numerical analysis, the Kahan summation algorithm (also known as compensated summation) significantly reduces the numerical error in the total obtained by adding a sequence of finite precision floating point numbers, compared to the obvious approach. This is done by keeping a separate running compensation (a variable to accumulate small errors).