Extremely slow subroutine calls

Dear all,

I have a small test program (see below), which compares different calling types invoking a derived type: via generic functions, via type bound procedures using a type instance and via type bound procedures using a class instance. Most other compilers I have checked, the first two call types are one order of magnitude faster (ca. 0.3s) than the third one (ca. 3s). The current pgfortran (17.10) produces a binary where all three are equally slow (ca. 3s). I am wondering, whether any compiler flag can speed it up (I’ve tried -O3), or it is an internal optimisation problem.

module testmod
  implicit none

  type :: TStatic
    private
    integer :: val = 1
  end type TStatic

  type :: TPoly
    private
    integer :: val = 1
  contains
    procedure :: incValue => TPoly_incValue
    procedure :: getValue => TPoly_getValue
  end type TPoly

  interface incValue
    module procedure TStatic_incValue
  end interface incValue

  interface getValue
    module procedure TStatic_getValue
  end interface getValue

contains

  subroutine TStatic_incValue(this, increment)
    type(TStatic), intent(inout) :: this
    integer, intent(in) :: increment

    this%val = this%val + increment

  end subroutine TStatic_incValue


  function TStatic_getValue(this) result(val)
    type(TStatic), intent(in) :: this
    integer :: val

    val = this%val

  end function TStatic_getValue


  subroutine TPoly_incValue(this, increment)
    class(TPoly), intent(inout) :: this
    integer, intent(in) :: increment

    this%val = this%val + increment

  end subroutine TPoly_incValue


  function TPoly_getValue(this) result(val)
    class(TPoly), intent(in) :: this
    integer :: val

    val = this%val

  end function TPoly_getValue

end module testmod
    
    
program test
  use testmod
  implicit none

  type(TStatic) :: staticInst
  type(TPoly) :: polyInst, polyInst2
  class(TPoly), allocatable :: classInst

  integer :: nCycles
  integer :: ii
  real :: t1, t2

  nCycles = 1000000000 ! 1e9
  print '(A,I0)', 'Nr. of iterations:', nCycles
  call cpu_time(t1)
  do ii = 1, nCycles
    call incValue(staticInst, ii)
  end do
  call cpu_time(t2)
  print '(A,T30,I0,F6.2)', 'Static:', getValue(staticInst), t2 - t1

  call cpu_time(t1)
  do ii = 1, nCycles
    call polyInst%incValue(ii)
  end do
  call cpu_time(t2)
  print '(A,T30,I0,F6.2)', 'Polymorhic via type:', polyInst%getValue(), t2 - t1

  allocate(classInst, source=polyInst2)
  call cpu_time(t1)
  do ii = 1, nCycles
    call classInst%incValue(ii)
  end do
  call cpu_time(t2)
  print '(A,T30,I0,F6.2)', 'Polymorphic via class:', classInst%getValue(), t2 - t1
  
end program test

I tried your test with intel, pgi, and gnu compilers.


% ifort -o test_intel test.f90 dclock_64.s
% gfortran -o test_gfort test.f90 dclock_64.s
% pgfortran -o test_pgi test.f90 dclock_64.s

% test_pgi ; test_gfort ; test_intel

PGI

Nr. of iterations:1000000000
Static: -243309311 2.53
Polymorhic via type: -243309311 2.31
Polymorphic via class: -243309311 2.47

GFORTRAN

Nr. of iterations:1000000000
Static: -243309311 3.05
Polymorhic via type: -243309311 4.11
Polymorphic via class: -243309311 4.19

INTEL

Nr. of iterations:1000000000
Static: -243309311 0.46
Polymorhic via type: -243309311 0.47
Polymorphic via class: -243309311 2.28



So we beat gfortran and have opportunities compared to intel.

We have filed this issue as TPR 25007.

dave

It may be the optimisation level. However, using recent gfortran and -O2, I get times similar to Intel:

[642]> gfortran --version
GNU Fortran (GCC) 7.2.0
Copyright (C) 2017 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

[643]> gfortran -O2 adtcalls.f90 

[644]> time ./a.out 
Nr. of iterations:1000000000
Static:                      -243309311  0.29
Polymorhic via type:         -243309311  0.30
Polymorphic via class:       -243309311  2.01

real    0m2.649s
user    0m2.596s
sys     0m0.004s

while doing the same with the PGI-compiler I get

[503]> pgfortran --version

pgfortran 17.10-0 64-bit target on x86-64 Linux -tp haswell 
PGI Compilers and Tools
Copyright (c) 2017, NVIDIA CORPORATION.  All rights reserved.

[504]> pgfortran -O2 adtcalls.f90 

[505]> time ./a.out 
Nr. of iterations:1000000000
Static:                      -243309311  2.01
Polymorhic via type:         -243309311  1.97
Polymorphic via class:       -243309311  1.99

real    0m6.003s
user    0m5.960s
sys     0m0.004s

What perplexes me is that the static bindings are exactly as slow as the last polymorphic one. This may become an unfortunate show stopper, when modernizing Fortran codes by collecting related data into derived types.

I have added your comments to the TPR 52007.

dave