Floating-point precision and fortran write statement


I’m a little bit of confused about these codes and output.

program fl
  implicit none
  real(kind=8) :: a,b,c
  write(*,'(f25.22)') a
end program

pgf90 -Mnofprelaxed  -Kieee -O0 fl.f90



A double has only 15.95 decimal digits of precision. It asks for outputing 22 decimal digits.

My questions are

  1. For my understanding, it should be 0.6999999999999999 in memory. So where the last 6 decimal digits come from?

  2. It sounds like -Mnofprelaxed -Kieee didn’t give any help in this case. So is there any compliation option help for this case?

  3. Is there any way to get correct result of 22 decimal digits output ?

    Appreciating any comments


For my understanding, it should be 0.6999999999999999 in memory.

The in-memory representation is binary IEEE, not decimal. Just as the number 0.1 in base-3 becomes 0.333333333… in base-10, there are numbers with short representations in binary that have infinitely long decimal versions and vice versa.

The 64-bit IEEE representation of 10.0/3.0, expressed in hex, is 400AAAAAAAAAAAAA (truncated) or 400AAAAAAAAAAAAB (rounded).

Fortran output routines can produce long strings of digits following conversion to decimal. Not all the digits are to be taken as correct.

Here is code to print out the internal representations that I mentioned:

program decout
implicit none
integer ix(2)
real*8 x
equivalence (ix,x)
end program

Thanks for reply.

decimal 0.7 in-memory representation in IEEE is

0 01111111110 0110011001100110011001100110011001100110011001100110

And 0.6999999999999999 in decimal vice versa.

I tried extended 80-bit precision with gfortran

program fp_ext
  real*10 :: a, b, c
  b = 7d0
  c = 10d0
  a = b/c
  write(*,'(f25.22)') a
end program fp_ext

gfortran -O0 fp_ext.f90

The output is 0.6999999999999999999892.

For double, the “noise” digits are 555911
For extended 80-bit precision, the “noise” digits are 892.

Just wondering why is this?

Thanks for any comments

You may find the NEAREST and SPACING intrinsics helpful in obtaining better understanding of the issues that you raised. For example, run this program:

program nrx
double precision x
10 format(3F25.22)
end program

Thanks for further comments.

See if I understood this correctly

It’s the processor-representable number. In my case, the FP registers are 128-bit registers, it can out up to 34 decimal digits of precision.

The Write statement stream is

in-memory ( 64-bits ) -> round to nearest > Registers( 128-bits ) - > round to nearest -> I/O ( f25.22 )

Appreciating your time

What 128-bit registers? PGI does not support quadruple precision floating point (see https://forums.developer.nvidia.com/t/real-16/134046/1 ). If you are referring to the 128, 256 and 512 bit floating point vector registers, note that they contain packed 32 or 64 bit floating point numbers.

Thanks for further comments

FP registers are 128-bit XMM registers, extended to 256-bit YMM registers with AVX - > http://en.wikipedia.org/wiki/Processor_register

The same code with Write(*,’(f64.60)’) on Intel Xeon X5675 CPU to see how many decimal digits it will output

program fl
  implicit none
  real(kind=8) :: a,b,c
  write(*,'(f64.60)') a
end program

It got these outputs

gfortran -O0 -ffloat-store -fno-fast-math -o gnu.out fl.f90
-----------------------------------^ 24 Significands

ifort -fp-model precise -O0 -o intel.out fl.f90
--------------------------------------------------------^ 40 Significands

pgf90 -Kieee -O0 -o pgi.out fl.f90
------------------------------------------------------------------------------^ 53 Significands

According the wiki page, the processor-representable number should be 34 decimal digits of precision.

The output should be only 34 decimal digits. But now both of Intel and PGI get more significands than 34 decimal digits, except GNU compiler.

I’m still not clear for this

in-memory ( 64-bit ) -> where it goes ? -> more than 34 decimal digits output

Could someone kindly give me more comments on how it works ?

Appreciating your time



You continue to confuse the number of decimal digits in the external representation (printed out with a formatted output statement) with the size of the same number in the internal representation. Likewise, you confuse the bit size of vector registers (XMM, YMM, etc.) with the bit size of the floating point numbers contained in those registers.

The article http://en.wikipedia.org/wiki/Quadruple-precision_floating-point_format contains this pertinent statement:

Quadruple-precision (128-bit) hardware implementation should not be confused with “128-bit FPUs” that implement SIMD instructions, such as Streaming SIMD Extensions or AltiVec, which refers to 128-bit vectors of four 32-bit single-precision or two 64-bit double-precision values that are operated on simultaneously.

The following example program computes 22.0/7.0 using (i) 32-bit floating point, and (ii) long division using integers. You may note that the long division could be programmed on a four-bit computer (4 rather than 32 or 64) and could produce hundreds of digits of output, although the digit stream has a recognizable cycle length of 6 beyond the initial string.

program fpx
implicit none
real NotPI
call divl(22,7)
end program

subroutine divl(d,v)
implicit none
integer :: d,v,i,r,q
character(len=50) :: s
q=d/v; r=d-v*q; s='3.'
do i=3,22
   q=r/v; r=r-q*v; s(i:i)=char(q+48)
end do
end subroutine

Only 16 of the expected first characters appear like In the older processors. The other ones look like a random representation.