NV 23.11 not in-lining with -Minline (works with 23.5)

caplanr · January 26, 2024, 9:58pm

Hi,

I am trying to compile a do concurrent code for offload.

I have several loops that call functions in them, and even though the functions are “pure”, NV does not support them yet.

Therefore, I inline the functions with “-Minline:” which in-lines most of the them, but for some I have to manually specify the functions with this flag:

-Minline=reshape,name:boost,interp,s2c,c2s,sv2cv

With NV 23.5 this works fine. However, for NV23.11 I get:

NVFORTRAN-S-1074-Procedure call in Do Concurrent is not supported yet

and cannot compile the code.

Have the in-lining flags been changed?

Thanks!

– Ron

MatColgrove · January 27, 2024, 12:04am

No, the inline flags haven’t changed. Though I’m not sure what did change to cause the routines not to be inlined.

If you add the flag “-Minfo=inline”, does the compiler feedback messages tell you anything like why it can’t inline a particular routine?

caplanr · January 29, 2024, 10:48pm

Hi,

I get:

mpif90 -Minline -O3 -march=native -stdpar=gpu -gpu=cc86 -I/opt/psi/nv/ext_deps/deps/hdf4/include -I/opt/psi/nv/ext_deps/deps/hdf5/include -c mas_sed_expmac.f -o mas.o
NVFORTRAN-S-1074-Procedure call in Do Concurrent is not supported yet (mas_sed_expmac.f: 27954)
NVFORTRAN-S-1074-Procedure call in Do Concurrent is not supported yet (mas_sed_expmac.f: 27991)
NVFORTRAN-S-1074-Procedure call in Do Concurrent is not supported yet (mas_sed_expmac.f: 28035)
0 inform, 0 warnings, 3 severes, 0 fatal for load_matrix_t_solve
NVFORTRAN-S-1074-Procedure call in Do Concurrent is not supported yet (mas_sed_expmac.f: 33890)
0 inform, 0 warnings, 1 severes, 0 fatal for advparticles
NVFORTRAN-S-1074-Procedure call in Do Concurrent is not supported yet (mas_sed_expmac.f: 38050)
0 inform, 0 warnings, 1 severes, 0 fatal for advte
NVFORTRAN-S-1074-Procedure call in Do Concurrent is not supported yet (mas_sed_expmac.f: 53248)
0 inform, 0 warnings, 1 severes, 0 fatal for initialize_heating
NVFORTRAN-S-1074-Procedure call in Do Concurrent is not supported yet (mas_sed_expmac.f: 54284)
0 inform, 0 warnings, 1 severes, 0 fatal for heating
make: *** [Makefile:58: mas.o] Error 2

When I turn on -Minfo=inline it shows a LOT of output of what it was able and not able to inline.
However, for the specific lines mentioned above (the ones in DC that matter), there is no report as to why it inlined or did not inline.
It seems it is hitting the erro about a procedure in DC before it has a chance to try to inline?
Maybe the order of processing/error-checking has changed?

– Ron

scamp1 · January 30, 2024, 1:46am

A couple of questions that might help us identify what’s going on or ask the appropriate person what behavior may have changed:

Can you provide a pared down, small reproducing example that behaves differently w/ 23.5 then it does w/ 23.11? You might be able to get this quickly from cutting down one of the files that fails to compile with 23.11, but works with 23.5. This will be vital for understanding the behavior you’re seeing. I played w/ a couple of examples I could think up - and I couldn’t reproduce the behavior or see any differences in 23.11 vs 23.5. However, that just means we need more guidance to identify the edge case you may be encountering.

When you had -Minfo=inline on, did you also have the “-Minline=reshape,name:boost,interp,s2c,c2s,sv2cv” turned on? I don’t see that in your latest comment above but adding that may get you more information on what’s going on with the inlining of those particular functions. As you described w/ 23.5 - it appears the compiler isn’t interested in inlining those unless you explicitly guide it to. I imagine this will be the same w/ 23.11. If the output does then include information about your functions - run it w/ 23.5 and 23.11 and compare the output messages for the pertinent functions. That might guide us on what has changed - and what the compiler sees differently between the two situations.

caplanr · January 30, 2024, 8:24pm

Hi,

Here is a reproducer:

module func_interface
  interface
    pure function func (a)
      implicit none
      real*8, intent(in) :: a
      real*8 :: func
   end function func
 end interface
end module

pure function func (a)
  implicit none
  real*8, intent(IN) :: a
  real*8 :: func
  func = SIN(a)*2.0
  return
end function func

program nv2311_stdpar_inline

  use func_interface
  implicit none
  integer :: i
  integer, parameter :: N = 10
  real*8, dimension(:), allocatable :: x
  allocate (x(N))
  x(:) = 1.0  
  do concurrent (i=1:N)
    x(i) = func(x(i))
  enddo
  print*, x(:)

end program nv2311_stdpar_inline

With nvhpc 23.5:

$ nvfortran nv2311_stdpar_inline.f90 -o nv2311_stdpar_inline_cpu
$ ./nv2311_stdpar_inline_cpu
    1.682941969615793         1.682941969615793         1.682941969615793      
    1.682941969615793         1.682941969615793         1.682941969615793      
    1.682941969615793         1.682941969615793         1.682941969615793      
    1.682941969615793     

$ nvfortran nv2311_stdpar_inline.f90 -stdpar=gpu -o nv2311_stdpar_inline_gpu
NVFORTRAN-S-1074-Procedure call in Do Concurrent is not supported yet (nv2311_stdpar_inline.f90: 29)
  0 inform,   0 warnings,   1 severes, 0 fatal for nv2311_stdpar_inline  

$ nvfortran nv2311_stdpar_inline.f90 -Minline -Minfo=inline -stdpar=gpu -o nv2311_stdpar_inline_gpu
nv2311_stdpar_inline:
     29, func inlined, size=2, file nv2311_stdpar_inline.f90 (11)  

$ ./nv2311_stdpar_inline_gpu
    1.682941969615793         1.682941969615793         1.682941969615793      
    1.682941969615793         1.682941969615793         1.682941969615793      
    1.682941969615793         1.682941969615793         1.682941969615793      
    1.682941969615793

With nvhpc 23.11:

$ nvfortran nv2311_stdpar_inline.f90 -o nv2311_stdpar_inline_cpu
  
$ ./nv2311_stdpar_inline_cpu
    1.682941969615793         1.682941969615793         1.682941969615793      
    1.682941969615793         1.682941969615793         1.682941969615793      
    1.682941969615793         1.682941969615793         1.682941969615793      
    1.682941969615793      

$ nvfortran nv2311_stdpar_inline.f90 -stdpar=gpu -o nv2311_stdpar_inline_gpu
NVFORTRAN-S-1074-Procedure call in Do Concurrent is not supported yet (nv2311_stdpar_inline.f90: 29)
  0 inform,   0 warnings,   1 severes, 0 fatal for nv2311_stdpar_inline  

$ nvfortran nv2311_stdpar_inline.f90 -Minline -Minfo=inline -stdpar=gpu -o nv2311_stdpar_inline_gpu
NVFORTRAN-S-1074-Procedure call in Do Concurrent is not supported yet (nv2311_stdpar_inline.f90: 29)
  0 inform,   0 warnings,   1 severes, 0 fatal for nv2311_stdpar_inline  
  
$ nvfortran nv2311_stdpar_inline.f90 -Minline=func -Minfo=inline -stdpar=gpu -o nv2311_stdpar_inline_gpu
NVFORTRAN-S-1074-Procedure call in Do Concurrent is not supported yet (nv2311_stdpar_inline.f90: 29)
  0 inform,   0 warnings,   1 severes, 0 fatal for nv2311_stdpar_inline

The NV 23.11 is not in-lining.

– Ron

scamp1 · January 30, 2024, 10:29pm

Hi Ron,

I played with your code and saw the issue you described. I also see that it starts between 23.7 and 23.9. I’m not sure why we stopped inlining the functions here. However, I was able to develop a work around for you. You seem to only be interested in inlining those functions because it enables you to use ‘do concurrent’ with the function calls. If instead of inlining these functions, you just make them acc routines, then you will get the behavior you’re hoping for. For me, this looks like this:

module func_interface
interface
pure function func (a)
  !$ACC ROUTINE SEQ
  implicit none
  real*8, intent(in) :: a
  real*8 :: func
 end function func
end interface
end module

pure function func (a)
 !$ACC ROUTINE SEQ
 implicit none
 real*8, intent(IN) :: a
 real*8 :: func
 func = SIN(a)*2.0
 return
 end function func

program nv2311_stdpar_inline

 use func_interface
 implicit none
 integer :: i
 integer, parameter :: N = 10
 real*8, dimension(:), allocatable :: x
 allocate (x(N))
  x(:) = 1.0
  do concurrent (i=1:N)
   x(i) = func(x(i))
 enddo
 print*, x(:)
end program nv2311_stdpar_inline

Naming this file test.f90, I can successfully compile it with NVHPC 23.11:

nvfortran test.f90 -Minfo=all -stdpar=gpu -o test
func:
12, Generating acc routine seq
Generating NVIDIA GPU code
nv2311_stdpar_inline:
30, Generating NVIDIA GPU code
30, Loop parallelized across CUDA thread blocks, CUDA threads(32) blockidx%x threadidx%x
30, Generating implicit copy(x(:)) [if not already present]

And I can successfully run it as well:

./test
1.682941969615793 1.682941969615793 1.682941969615793
1.682941969615793 1.682941969615793 1.682941969615793
1.682941969615793 1.682941969615793 1.682941969615793
1.682941969615793

Note that this approach is compatible with 23.5 compilers as well, and yields the same result. I made the function “!$ACC ROUTINE SEQ”, but for more interesting functions, you can also make it vector to possibly add more parallelism to your code.

I don’t know anything about changes in inlining behavior - but it’s possible that the gpu compiler team realized that people were having to try to inline their procedures inside ‘do concurrent’ constructs to get them to work, and realized that they should instead just force the procedures to be acc routines instead to solve the problem. I’m not sure - but hopefully this helps resolve your issue! Let me know if there are any other issues.

caplanr · January 30, 2024, 10:36pm

Hi,

Thanks, but I already have a version of the code that uses “acc routine”.

The purpose of this version is to demonstrate a large code that can offload to the GPUs with only using the Fortran standard with ZERO directives (see https://ieeexplore.ieee.org/document/10196584 for details - this is “Code 5”).

I don’t know why they would stop in-lining as it allows the use of DC (especially since they are “pure” functions).
The compiler has a flag for inlining functions which it should do regardless of where they are.

Any chance this could get fixed for the next release?

– Ron

scamp1 · January 30, 2024, 10:57pm

Let me check internally if this was an unexpected change in behavior and I’ll get back to you on it. If it is, I’ll open an internal bug report on it and push to get it resolved for you.

scamp1 · January 30, 2024, 11:52pm

I just heard back from the GPU team about this. We believe this is a regression in behavior that was not expected. I’m going to open an internal bug report on this and try to get it resolved for you. Thanks for bringing this to our attention! If I get any updates, I’ll update this post with the information for you.

caplanr · February 1, 2024, 6:53pm

Note: This issue is in the new 24.1 release as well.
– Ron

scamp1 · February 1, 2024, 7:07pm

Thanks! That’s to be expected since we just became aware of the issue. We hope to have a fix into 24.3.

Topic		Replies	Views
LLVM Error when compiling C++ STD parallel execution policies to GPU nvc, nvc++ and nvfortran	9	412	May 2, 2024
Compilation error for nested device subroutines with constant module data nvc, nvc++ and nvfortran	1	14	September 16, 2024
Compiler Error in nvfortran 23.11 and 23.7 nvc, nvc++ and nvfortran	2	474	December 14, 2023
Problem with NVFORTRAN and R nvc, nvc++ and nvfortran	46	2740	April 25, 2024
Nvhpc 23.11 fortran - Does it inline public subroutines across modules? nvc, nvc++ and nvfortran	4	442	June 28, 2024
[nvhpc-22.2] error: use of undefined value '%L.LB26_8163' nvc, nvc++ and nvfortran	27	2857	July 7, 2023
Order of operations within do concurrent on GPU nvc, nvc++ and nvfortran	8	568	July 31, 2023
Bug in nvfortran 22.3: false positive of out-bound subscripts nvc, nvc++ and nvfortran	14	1450	June 9, 2023
function inline problem Legacy PGI Compilers	2	3522	April 15, 2014
NVHPC 22.5 fort2 TERMINATED by signal 11 nvc, nvc++ and nvfortran nvbugs	11	1438	May 25, 2023

NV 23.11 not in-lining with -Minline (works with 23.5)

With nvhpc 23.5:

With nvhpc 23.11:

Related topics