Hi Mat
I prepared a small example to reproduce the behavior I observed, you can find in on my github:
There is a simple main function with a do loop, and in this loop there is a call to another function defined in the file utils.f90. The Makefile is copied from the PGI documentation to first extract an inline library and then inline the function into main.f90. Here is what I observed:
When compiling on PGI 18.5, I get the following output:
$ make
pgfortran -O2 -Minfo=all -Mllvm -Mextract=15 -o utils.il utils.f90
pgfortran -O2 -Minfo=all -Mllvm -Minline=utils.il -c main.f90
main:
17, Memory set idiom, loop replaced by call to __c_mset4
23, Loop not vectorized: may not be beneficial
Loop unrolled 4 times
24, pow2 inlined, size=2, file utils.f90 (12)
pgfortran -O2 -Minfo=all -Mllvm -c utils.f90
pgfortran -o myprog main.o utils.o
The function pow2 is thus correctly inlined. I can also remove the -Mllvm flag and get the same result.
When compiling with PGI 19.5/7/9 I get the following output:
$ make
pgfortran -O2 -Minfo=all -Mllvm -Mextract=15 -o utils.il utils.f90
pgfortran -O2 -Minfo=all -Mllvm -Minline=utils.il -c main.f90
main:
17, Memory set idiom, loop replaced by call to __c_mset4
23, Loop not vectorized/parallelized: contains call
pgfortran -O2 -Minfo=all -Mllvm -c utils.f90
pgfortran -o myprog main.o utils.o
The function pow2 is not inlined here! Once I set the flag ‘-Mnollvm’ for PGI 19.x, I get the same behavior as for PGI 18.5 though, with the function pow2 inlined correctly:
$ make
pgfortran -O2 -Minfo=all -Mnollvm -Mextract=15 -o utils.il utils.f90
pgfortran -O2 -Minfo=all -Mnollvm -Minline=utils.il -c main.f90
main:
17, Memory set idiom, loop replaced by call to __c_mset4
23, Loop not vectorized: loop count too small
Loop unrolled 8 times
24, pow2 inlined, size=2, file utils.f90 (12)
pgfortran -O2 -Minfo=all -Mnollvm -c utils.f90
pgfortran -o myprog main.o utils.o
Can you confirm this? Or is there something wrong with our PGI 19.x installation? I tried on two different machines of CSCS (https://www.cscs.ch/) and get the same result.
Thanks
Yannick