pgfortran single precision comparison with ifort

Hi.

I have been struggling with porting some fortran to pgfortran cuda. I have solved my problem, but in so doing have discovered something that may be of interest. pgfortran single precision breaks down more or less where you’d expect it to, however intel ifort somehow manages to pull off higher precision than you’d expect! Here is a test program that helped me identify my problem - all it does is compare the expressions 1-cos(a) and a^2/2 (which should tend to eachother for small a) and sin(a) and a (which again for small a, tend to eachother).

     program testprec

      real*4 :: a

      print*,' real precision : ',precision(a)

      do i=1,precision(a)

         a = 0.1**i

         write(6,'(A3,F20.16)') 'a: ', a
         write(6,'(A10,F16.12,A9,F16.12)') '1-cos(a): ', 1.0-cos(a),
     -                                      ' a^2/2 : ',   a**2/2.0
         write(6,'(A10,F16.12,A9,F16.12)') '  sin(a): ', sin(a),
     -                                      ' a     : ',   a

      enddo

      end

Here is the pgfortran result:

  real precision :             6
a:   0.1000000014901161
1-cos(a):   0.004995822906 a^2/2 :   0.005000000354
  sin(a):   0.099833421409 a     :   0.100000001490
a:   0.0100000007078052
1-cos(a):   0.000050008297 a^2/2 :   0.000050000006
  sin(a):   0.009999834001 a     :   0.010000000708
a:   0.0010000000474975
1-cos(a):   0.000000476837 a^2/2 :   0.000000500000
  sin(a):   0.000999999931 a     :   0.001000000047
a:   0.0001000000120257
1-cos(a):   0.000000000000 a^2/2 :   0.000000005000
  sin(a):   0.000100000012 a     :   0.000100000012
a:   0.0000100000015664
1-cos(a):   0.000000000000 a^2/2 :   0.000000000050
  sin(a):   0.000010000002 a     :   0.000010000002
a:   0.0000010000002248
1-cos(a):   0.000000000000 a^2/2 :   0.000000000001
  sin(a):   0.000001000000 a     :   0.000001000000

As you can see, the evaluation of 1-cos(a) fails for a=0.0001 - you might expect this for decimal place #4 (i.e. precision(a)/2 +1). The approximation a^2/2 is still OK though - so my code now uses the approximation for small a.

The intel compiler though gives this:

     real precision :            6
a:   0.1000000014901161
1-cos(a):   0.004995835014 a^2/2 :   0.005000000354
  sin(a):   0.099833421409 a     :   0.100000001490
a:   0.0100000007078052
1-cos(a):   0.000049999591 a^2/2 :   0.000050000006
  sin(a):   0.009999834001 a     :   0.010000000708
a:   0.0010000000474975
1-cos(a):   0.000000500000 a^2/2 :   0.000000500000
  sin(a):   0.000999999931 a     :   0.001000000047
a:   0.0001000000047497
1-cos(a):   0.000000005000 a^2/2 :   0.000000005000
  sin(a):   0.000100000005 a     :   0.000100000005
a:   0.0000100000006569
1-cos(a):   0.000000000050 a^2/2 :   0.000000000050
  sin(a):   0.000010000001 a     :   0.000010000001
a:   0.0000010000001112
1-cos(a):   0.000000000001 a^2/2 :   0.000000000001
  sin(a):   0.000001000000 a     :   0.000001000000

As you can see, the expression for 1-cos(a) continues to be valid for smaller and smaller a. I suspect that the intel compiler deploys versions of sine and cosine that identify that a is small and fall back on the approximations should this be the case.

If GPUs continue to be much faster in single precision for the foreseeable future, it might be a good idea to improve the trig functions in pgfortran so they are at least as good as the intel versions. Just a thought - I know you have an infinite list of ideas/requests!!!

Rob.

I just used your test program with the Intel 11.1 compilers on a Intel processor based machine. I compiled as follows:

ifort testprec.f

The Intel results that I got do not match those that you posted. They are more inline with the PGI results:

real precision : 6
a: 0.1000000014901161
1-cos(a): 0.004995822906 a^2/2 : 0.005000000354
sin(a): 0.099833421409 a : 0.100000001490
a: 0.0100000007078052
1-cos(a): 0.000050008297 a^2/2 : 0.000050000006
sin(a): 0.009999834001 a : 0.010000000708
a: 0.0010000000474975
1-cos(a): 0.000000476837 a^2/2 : 0.000000500000
sin(a): 0.000999999931 a : 0.001000000047
a: 0.0001000000047497
1-cos(a): 0.000000000000 a^2/2 : 0.000000005000
sin(a): 0.000100000005 a : 0.000100000005
a: 0.0000100000006569
1-cos(a): 0.000000000000 a^2/2 : 0.000000000050
sin(a): 0.000010000001 a : 0.000010000001
a: 0.0000010000001112
1-cos(a): 0.000000000000 a^2/2 : 0.000000000001
sin(a): 0.000001000000 a : 0.000001000000

Which version of the Intel compiler and what compiler options did you use?

Hi. I will do some further tests - the results I showed were on an AMD machine so can’t be due to clever intel tech… I need to install ifort on our latest intel cuda machine…will get back to you.

Rob.

Hi Rob,

Are you running in 32-bits? My best guess is the difference is due to Intel using the 80-bit x87 FPU while PGI is using the 64-bit SSE FPU. If I compile your code with “-tp=piii”, the PGI compiler will use the x87 FPU and match your Intel result. Note that Craig was compiling in 64-bits and why he was unable to recreate your Intel results.

% pgfortran -tp=piii testprec.f ; a.out
  real precision :             6
a:   0.1000000014901161
1-cos(a):   0.004995835014 a^2/2 :   0.005000000354
  sin(a):   0.099833421409 a     :   0.100000001490
a:   0.0100000007078052
1-cos(a):   0.000049999591 a^2/2 :   0.000050000006
  sin(a):   0.009999834001 a     :   0.010000000708
a:   0.0010000000474975
1-cos(a):   0.000000500000 a^2/2 :   0.000000500000
  sin(a):   0.000999999931 a     :   0.001000000047
a:   0.0001000000047497
1-cos(a):   0.000000005000 a^2/2 :   0.000000005000
  sin(a):   0.000100000005 a     :   0.000100000005
a:   0.0000100000006569
1-cos(a):   0.000000000050 a^2/2 :   0.000000000050
  sin(a):   0.000010000001 a     :   0.000010000001
a:   0.0000010000001112
1-cos(a):   0.000000000001 a^2/2 :   0.000000000001
  sin(a):   0.000001000000 a     :   0.000001000000
  • Mat