Syntax for inlining a Fortran module subroutine

If I have something like:

module foo
   contains
   pure function bar()
   end function bar
end module foo

How can I tell nvfortran to inline bar? I’ve tried -Minline=bar and -Minline=foo_bar, but it doesn’t seem like the inliner is being used. bar is trivial, so it shouldn’t be failing inlining criteria.

Thanks,
Paul

Just to clarify, I would like a call to bar from outside of foo to be inlined.

Assuming that this is cross file and until we get IPA up and running again (it’s in progress), you’ll need to do a two pass compilation. First pass add “-Mextract=lib:” to extract the inlining information and then a second pass with “-Minline=lib:” to use this info to inline.

While this does now make the inline info available to the compiler, it’s not a guarantee that it will inline. Other factions such as the call depth, size, and if any array args need to be reshaped can limit inlining. You may need to adjust other settings via the -Minline sub-options (see “nvfortran -help -Minline” for the complete list).

Hope this helps,
Mat

Hi,

We’re running into a similar issue at the moment and were wondering if it is possible to do a single pass for extracting and inlining?

We are using a parallel make environment, which is probably why the -Mipa=inline is not working.
But it would be great to have confirmation whether there is no other option as well.

Cheers,

Okke

Hi Okke,

Do you mean the two pass -Mextract/-Minline? -Mipa has been disabled for a few years now due to complications of integrating it into the LLVM back-end. The flag is still there for makefile compatibility but should be giving you a warning that it’s been deprecated.

if it is possible to do a single pass for extracting and inlining?

If all the source files are on the same compile command, then single pass can be used. But if each object is compiled separately, you need to use the two-pass method.

Hope this helps,
Mat

1 Like

Hi Mat,

If we take this question more general: is there any other way to force inlining of subroutines in nvfortran besides the two-pass routine with -Mextract/-Minline? Anything like forceinline pragma or whatever other options?

I’m asking because inlining of subroutines seem to be mandatory (or highly desirable) in GPU code, so I’m just wondering if the compiler design implies any straight-forward solution or a choice of solutions for this rather general problem.

No, there’s no way to force inlining. You can give hints such as the “inline” keyword in C/C++ or “-Minline=”. Though for all the various ways inlining is performed, in all cases, the definition of the routine to be inlined needs to be visible when compiling the routine in which it is to be inlined.

In other words, “-Mextract” is the way to gather the information needed about routines not within the same compiling unit so the compiler can attempt to inline. It doesn’t force inlining.

The size of routine often affects if it can be inlined or not, The option “-Minline=maxsize:” can be used to increase the allowable size of an individual routine, and “-Minline=totalsize:” for the total size including inlining multiple levels of routines.

Hi Mat,

Thanks for explanations.
OK, I’m thinking on a different solution that will in fact force inlining of some target procedures. Since I’m interested only in a specific case of generating GPU coda using OpenMP offloading directives, this combination seem to work well:

File aaa.f90:

MODULE AAA
   use iso_fortran_env
   IMPLICIT NONE
   PUBLIC BBB

   INTERFACE BBB
       MODULE PROCEDURE CCC
   END INTERFACE
   CONTAINS

   FUNCTION CCC( pa, pb )
      !$omp declare target
      REAL(real64) :: pa,pb          ! input
      REAL(real64) :: CCC    ! result
      !!-----------------------------------------------------------------------
      IF ( pb >= 0.e0) THEN   ;   CCC = ABS(pa)
      ELSE                    ;   CCC =-ABS(pa)
      ENDIF
   END FUNCTION CCC
END MODULE AAA

File testinl.f90:

module testinl
  use omp_lib 
  use iso_fortran_env
  use AAA
  implicit none

contains 
  subroutine testinl()
    real (real64) :: X
    real (real64) :: Y 
    integer :: j, i

    !$omp target teams distribute collapse ( 2 ) &
    !$omp map ( to: Y ) map ( from: X )
    do j = 1, 2000
      do i = 1, 1000
        X = Y * BBB(1.0_real64, 1.0_real64)
      end do
    end do 
    !$omp end target teams distribute 
  end subroutine testinl
end module testinl
  
program Test
  use omp_lib
  use iso_fortran_env
  use testinl
  implicit none

  call testinl() 

end program Test

File build.sh:

nvfortran -fopenmp -mp=gpu -Minfo=mp -gpu=cc80 -o aaa.o -c aaa.f90
nvfortran -fopenmp -mp=gpu -Minfo=mp -gpu=cc80 -I. -o testinl testinl.f90 aaa.o

The aaa.f90 file contains a function to inline. We use !omp declare target in its body. If we remove this directive, functions would not be inlined, and linker error on device code will happen.

Can you agree that this seem to be (a special case) solution for inlining device functions? Can you recommend anything else within this context?

The reason why the link is failing is because there’s no device subroutine for “ccc”. “declare target” is needed so the compiler to know it needs to creates a device callable version of the subroutine but does not implicitly inline it. To inline, the definition of the callee must be visible when compiling the caller which is not the case here.

To illustrate I removed the “declare target” from “ccc” so when attempting to compile, we get a link error given there’s no device version of “ccc”:

% nvfortran -fast -mp=gpu aaa.f90 testinl.f90
aaa.f90:
testinl.f90:
nvlink error   : Undefined reference to 'aaa_ccc_' in '/tmp/nvfortranfxqOkpr0jKDpZ.o'
pgacclnk: child process exit status 2: /proj/nv/Linux_x86_64/23.9/compilers/bin/tools/nvdd

While it’s best practice to use “declare target”, we can instead inline the routine using the “-Minline” flag:

% nvfortran -fast -mp=gpu aaa.f90 testinl.f90 -Minline
aaa.f90:
testinl.f90:
aaa.f90:
testinl.f90:

Notice that the file names are listed twice. This because the compiler is doing two passes. First extract the information from the source that’s needed for inlining, then compile the object using this information. While it too much to post, if you use the verbose (-v) flag, you can see in detail the different phases. “fort2ex” is the extract utility while “fort1” is the front-end compiler and “fort2” the back-end compiler.

When the source files are compiled separately, the compiler can’t implicitly perform this extraction. Instead the user must add an extract step to their build storing the results of the first pass in an inline library.

% sh -x bld.sh
+ nvfortran -fast -c aaa.f90 -Mextract=lib:foo
+ nvfortran -fast -c testinl.f90 -Mextract=lib:foo
+ nvfortran -fast -c aaa.f90 -Minline=lib:foo
+ nvfortran -fast -c testinl.f90 -Minline=except:testinl,lib:foo
+ nvfortran -fast -o testinl testinl.o aaa.o

LTO does make this easier (and I hope our engineers will be able to support it again in the future), but it’s also doing two-passes. It first gathers the inline information during the first compilation and then at link time re-compiles all the source using this information.

-Mat

1 Like