NV 23.11 not in-lining with -Minline (works with 23.5)

Hi,

I am trying to compile a do concurrent code for offload.

I have several loops that call functions in them, and even though the functions are “pure”, NV does not support them yet.

Therefore, I inline the functions with “-Minline:” which in-lines most of the them, but for some I have to manually specify the functions with this flag:

-Minline=reshape,name:boost,interp,s2c,c2s,sv2cv

With NV 23.5 this works fine. However, for NV23.11 I get:

NVFORTRAN-S-1074-Procedure call in Do Concurrent is not supported yet

and cannot compile the code.

Have the in-lining flags been changed?

Thanks!

– Ron

No, the inline flags haven’t changed. Though I’m not sure what did change to cause the routines not to be inlined.

If you add the flag “-Minfo=inline”, does the compiler feedback messages tell you anything like why it can’t inline a particular routine?

Hi,

I get:

mpif90 -Minline -O3 -march=native -stdpar=gpu -gpu=cc86 -I/opt/psi/nv/ext_deps/deps/hdf4/include -I/opt/psi/nv/ext_deps/deps/hdf5/include -c mas_sed_expmac.f -o mas.o
NVFORTRAN-S-1074-Procedure call in Do Concurrent is not supported yet (mas_sed_expmac.f: 27954)
NVFORTRAN-S-1074-Procedure call in Do Concurrent is not supported yet (mas_sed_expmac.f: 27991)
NVFORTRAN-S-1074-Procedure call in Do Concurrent is not supported yet (mas_sed_expmac.f: 28035)
0 inform, 0 warnings, 3 severes, 0 fatal for load_matrix_t_solve
NVFORTRAN-S-1074-Procedure call in Do Concurrent is not supported yet (mas_sed_expmac.f: 33890)
0 inform, 0 warnings, 1 severes, 0 fatal for advparticles
NVFORTRAN-S-1074-Procedure call in Do Concurrent is not supported yet (mas_sed_expmac.f: 38050)
0 inform, 0 warnings, 1 severes, 0 fatal for advte
NVFORTRAN-S-1074-Procedure call in Do Concurrent is not supported yet (mas_sed_expmac.f: 53248)
0 inform, 0 warnings, 1 severes, 0 fatal for initialize_heating
NVFORTRAN-S-1074-Procedure call in Do Concurrent is not supported yet (mas_sed_expmac.f: 54284)
0 inform, 0 warnings, 1 severes, 0 fatal for heating
make: *** [Makefile:58: mas.o] Error 2

When I turn on -Minfo=inline it shows a LOT of output of what it was able and not able to inline.
However, for the specific lines mentioned above (the ones in DC that matter), there is no report as to why it inlined or did not inline.
It seems it is hitting the erro about a procedure in DC before it has a chance to try to inline?
Maybe the order of processing/error-checking has changed?

– Ron

A couple of questions that might help us identify what’s going on or ask the appropriate person what behavior may have changed:

Can you provide a pared down, small reproducing example that behaves differently w/ 23.5 then it does w/ 23.11? You might be able to get this quickly from cutting down one of the files that fails to compile with 23.11, but works with 23.5. This will be vital for understanding the behavior you’re seeing. I played w/ a couple of examples I could think up - and I couldn’t reproduce the behavior or see any differences in 23.11 vs 23.5. However, that just means we need more guidance to identify the edge case you may be encountering.

When you had -Minfo=inline on, did you also have the “-Minline=reshape,name:boost,interp,s2c,c2s,sv2cv” turned on? I don’t see that in your latest comment above but adding that may get you more information on what’s going on with the inlining of those particular functions. As you described w/ 23.5 - it appears the compiler isn’t interested in inlining those unless you explicitly guide it to. I imagine this will be the same w/ 23.11. If the output does then include information about your functions - run it w/ 23.5 and 23.11 and compare the output messages for the pertinent functions. That might guide us on what has changed - and what the compiler sees differently between the two situations.

Hi,

Here is a reproducer:

module func_interface
  interface
    pure function func (a)
      implicit none
      real*8, intent(in) :: a
      real*8 :: func
   end function func
 end interface
end module

pure function func (a)
  implicit none
  real*8, intent(IN) :: a
  real*8 :: func
  func = SIN(a)*2.0
  return
end function func

program nv2311_stdpar_inline

  use func_interface
  implicit none
  integer :: i
  integer, parameter :: N = 10
  real*8, dimension(:), allocatable :: x
  allocate (x(N))
  x(:) = 1.0  
  do concurrent (i=1:N)
    x(i) = func(x(i))
  enddo
  print*, x(:)

end program nv2311_stdpar_inline

With nvhpc 23.5:

$ nvfortran nv2311_stdpar_inline.f90 -o nv2311_stdpar_inline_cpu
$ ./nv2311_stdpar_inline_cpu
    1.682941969615793         1.682941969615793         1.682941969615793      
    1.682941969615793         1.682941969615793         1.682941969615793      
    1.682941969615793         1.682941969615793         1.682941969615793      
    1.682941969615793     

$ nvfortran nv2311_stdpar_inline.f90 -stdpar=gpu -o nv2311_stdpar_inline_gpu
NVFORTRAN-S-1074-Procedure call in Do Concurrent is not supported yet (nv2311_stdpar_inline.f90: 29)
  0 inform,   0 warnings,   1 severes, 0 fatal for nv2311_stdpar_inline  

$ nvfortran nv2311_stdpar_inline.f90 -Minline -Minfo=inline -stdpar=gpu -o nv2311_stdpar_inline_gpu
nv2311_stdpar_inline:
     29, func inlined, size=2, file nv2311_stdpar_inline.f90 (11)  

$ ./nv2311_stdpar_inline_gpu
    1.682941969615793         1.682941969615793         1.682941969615793      
    1.682941969615793         1.682941969615793         1.682941969615793      
    1.682941969615793         1.682941969615793         1.682941969615793      
    1.682941969615793     

With nvhpc 23.11:

$ nvfortran nv2311_stdpar_inline.f90 -o nv2311_stdpar_inline_cpu
  
$ ./nv2311_stdpar_inline_cpu
    1.682941969615793         1.682941969615793         1.682941969615793      
    1.682941969615793         1.682941969615793         1.682941969615793      
    1.682941969615793         1.682941969615793         1.682941969615793      
    1.682941969615793      

$ nvfortran nv2311_stdpar_inline.f90 -stdpar=gpu -o nv2311_stdpar_inline_gpu
NVFORTRAN-S-1074-Procedure call in Do Concurrent is not supported yet (nv2311_stdpar_inline.f90: 29)
  0 inform,   0 warnings,   1 severes, 0 fatal for nv2311_stdpar_inline  

$ nvfortran nv2311_stdpar_inline.f90 -Minline -Minfo=inline -stdpar=gpu -o nv2311_stdpar_inline_gpu
NVFORTRAN-S-1074-Procedure call in Do Concurrent is not supported yet (nv2311_stdpar_inline.f90: 29)
  0 inform,   0 warnings,   1 severes, 0 fatal for nv2311_stdpar_inline  
  
$ nvfortran nv2311_stdpar_inline.f90 -Minline=func -Minfo=inline -stdpar=gpu -o nv2311_stdpar_inline_gpu
NVFORTRAN-S-1074-Procedure call in Do Concurrent is not supported yet (nv2311_stdpar_inline.f90: 29)
  0 inform,   0 warnings,   1 severes, 0 fatal for nv2311_stdpar_inline

The NV 23.11 is not in-lining.

– Ron

Hi Ron,

I played with your code and saw the issue you described. I also see that it starts between 23.7 and 23.9. I’m not sure why we stopped inlining the functions here. However, I was able to develop a work around for you. You seem to only be interested in inlining those functions because it enables you to use ‘do concurrent’ with the function calls. If instead of inlining these functions, you just make them acc routines, then you will get the behavior you’re hoping for. For me, this looks like this:

module func_interface
interface
pure function func (a)
  !$ACC ROUTINE SEQ
  implicit none
  real*8, intent(in) :: a
  real*8 :: func
 end function func
end interface
end module

pure function func (a)
 !$ACC ROUTINE SEQ
 implicit none
 real*8, intent(IN) :: a
 real*8 :: func
 func = SIN(a)*2.0
 return
 end function func

program nv2311_stdpar_inline

 use func_interface
 implicit none
 integer :: i
 integer, parameter :: N = 10
 real*8, dimension(:), allocatable :: x
 allocate (x(N))
  x(:) = 1.0
  do concurrent (i=1:N)
   x(i) = func(x(i))
 enddo
 print*, x(:)
end program nv2311_stdpar_inline

Naming this file test.f90, I can successfully compile it with NVHPC 23.11:

nvfortran test.f90 -Minfo=all -stdpar=gpu -o test
func:
12, Generating acc routine seq
Generating NVIDIA GPU code
nv2311_stdpar_inline:
30, Generating NVIDIA GPU code
30, Loop parallelized across CUDA thread blocks, CUDA threads(32) blockidx%x threadidx%x
30, Generating implicit copy(x(:)) [if not already present]

And I can successfully run it as well:

./test
1.682941969615793 1.682941969615793 1.682941969615793
1.682941969615793 1.682941969615793 1.682941969615793
1.682941969615793 1.682941969615793 1.682941969615793
1.682941969615793

Note that this approach is compatible with 23.5 compilers as well, and yields the same result. I made the function “!$ACC ROUTINE SEQ”, but for more interesting functions, you can also make it vector to possibly add more parallelism to your code.

I don’t know anything about changes in inlining behavior - but it’s possible that the gpu compiler team realized that people were having to try to inline their procedures inside ‘do concurrent’ constructs to get them to work, and realized that they should instead just force the procedures to be acc routines instead to solve the problem. I’m not sure - but hopefully this helps resolve your issue! Let me know if there are any other issues.

Hi,

Thanks, but I already have a version of the code that uses “acc routine”.

The purpose of this version is to demonstrate a large code that can offload to the GPUs with only using the Fortran standard with ZERO directives (see https://ieeexplore.ieee.org/document/10196584 for details - this is “Code 5”).

I don’t know why they would stop in-lining as it allows the use of DC (especially since they are “pure” functions).
The compiler has a flag for inlining functions which it should do regardless of where they are.

Any chance this could get fixed for the next release?

– Ron

Let me check internally if this was an unexpected change in behavior and I’ll get back to you on it. If it is, I’ll open an internal bug report on it and push to get it resolved for you.

I just heard back from the GPU team about this. We believe this is a regression in behavior that was not expected. I’m going to open an internal bug report on this and try to get it resolved for you. Thanks for bringing this to our attention! If I get any updates, I’ll update this post with the information for you.

1 Like

Note: This issue is in the new 24.1 release as well.
– Ron

Thanks! That’s to be expected since we just became aware of the issue. We hope to have a fix into 24.3.