I am trying to compile a do concurrent code for offload.
I have several loops that call functions in them, and even though the functions are “pure”, NV does not support them yet.
Therefore, I inline the functions with “-Minline:” which in-lines most of the them, but for some I have to manually specify the functions with this flag:
-Minline=reshape,name:boost,interp,s2c,c2s,sv2cv
With NV 23.5 this works fine. However, for NV23.11 I get:
NVFORTRAN-S-1074-Procedure call in Do Concurrent is not supported yet
mpif90 -Minline -O3 -march=native -stdpar=gpu -gpu=cc86 -I/opt/psi/nv/ext_deps/deps/hdf4/include -I/opt/psi/nv/ext_deps/deps/hdf5/include -c mas_sed_expmac.f -o mas.o
NVFORTRAN-S-1074-Procedure call in Do Concurrent is not supported yet (mas_sed_expmac.f: 27954)
NVFORTRAN-S-1074-Procedure call in Do Concurrent is not supported yet (mas_sed_expmac.f: 27991)
NVFORTRAN-S-1074-Procedure call in Do Concurrent is not supported yet (mas_sed_expmac.f: 28035)
0 inform, 0 warnings, 3 severes, 0 fatal for load_matrix_t_solve
NVFORTRAN-S-1074-Procedure call in Do Concurrent is not supported yet (mas_sed_expmac.f: 33890)
0 inform, 0 warnings, 1 severes, 0 fatal for advparticles
NVFORTRAN-S-1074-Procedure call in Do Concurrent is not supported yet (mas_sed_expmac.f: 38050)
0 inform, 0 warnings, 1 severes, 0 fatal for advte
NVFORTRAN-S-1074-Procedure call in Do Concurrent is not supported yet (mas_sed_expmac.f: 53248)
0 inform, 0 warnings, 1 severes, 0 fatal for initialize_heating
NVFORTRAN-S-1074-Procedure call in Do Concurrent is not supported yet (mas_sed_expmac.f: 54284)
0 inform, 0 warnings, 1 severes, 0 fatal for heating
make: *** [Makefile:58: mas.o] Error 2
When I turn on -Minfo=inline it shows a LOT of output of what it was able and not able to inline.
However, for the specific lines mentioned above (the ones in DC that matter), there is no report as to why it inlined or did not inline.
It seems it is hitting the erro about a procedure in DC before it has a chance to try to inline?
Maybe the order of processing/error-checking has changed?
A couple of questions that might help us identify what’s going on or ask the appropriate person what behavior may have changed:
Can you provide a pared down, small reproducing example that behaves differently w/ 23.5 then it does w/ 23.11? You might be able to get this quickly from cutting down one of the files that fails to compile with 23.11, but works with 23.5. This will be vital for understanding the behavior you’re seeing. I played w/ a couple of examples I could think up - and I couldn’t reproduce the behavior or see any differences in 23.11 vs 23.5. However, that just means we need more guidance to identify the edge case you may be encountering.
When you had -Minfo=inline on, did you also have the “-Minline=reshape,name:boost,interp,s2c,c2s,sv2cv” turned on? I don’t see that in your latest comment above but adding that may get you more information on what’s going on with the inlining of those particular functions. As you described w/ 23.5 - it appears the compiler isn’t interested in inlining those unless you explicitly guide it to. I imagine this will be the same w/ 23.11. If the output does then include information about your functions - run it w/ 23.5 and 23.11 and compare the output messages for the pertinent functions. That might guide us on what has changed - and what the compiler sees differently between the two situations.
module func_interface
interface
pure function func (a)
implicit none
real*8, intent(in) :: a
real*8 :: func
end function func
end interface
end module
pure function func (a)
implicit none
real*8, intent(IN) :: a
real*8 :: func
func = SIN(a)*2.0
return
end function func
program nv2311_stdpar_inline
use func_interface
implicit none
integer :: i
integer, parameter :: N = 10
real*8, dimension(:), allocatable :: x
allocate (x(N))
x(:) = 1.0
do concurrent (i=1:N)
x(i) = func(x(i))
enddo
print*, x(:)
end program nv2311_stdpar_inline
I played with your code and saw the issue you described. I also see that it starts between 23.7 and 23.9. I’m not sure why we stopped inlining the functions here. However, I was able to develop a work around for you. You seem to only be interested in inlining those functions because it enables you to use ‘do concurrent’ with the function calls. If instead of inlining these functions, you just make them acc routines, then you will get the behavior you’re hoping for. For me, this looks like this:
module func_interface
interface
pure function func (a)
!$ACC ROUTINE SEQ
implicit none
real*8, intent(in) :: a
real*8 :: func
end function func
end interface
end module
pure function func (a)
!$ACC ROUTINE SEQ
implicit none
real*8, intent(IN) :: a
real*8 :: func
func = SIN(a)*2.0
return
end function func
program nv2311_stdpar_inline
use func_interface
implicit none
integer :: i
integer, parameter :: N = 10
real*8, dimension(:), allocatable :: x
allocate (x(N))
x(:) = 1.0
do concurrent (i=1:N)
x(i) = func(x(i))
enddo
print*, x(:)
end program nv2311_stdpar_inline
Naming this file test.f90, I can successfully compile it with NVHPC 23.11:
nvfortran test.f90 -Minfo=all -stdpar=gpu -o test
func:
12, Generating acc routine seq
Generating NVIDIA GPU code
nv2311_stdpar_inline:
30, Generating NVIDIA GPU code
30, Loop parallelized across CUDA thread blocks, CUDA threads(32) blockidx%x threadidx%x
30, Generating implicit copy(x(:)) [if not already present]
Note that this approach is compatible with 23.5 compilers as well, and yields the same result. I made the function “!$ACC ROUTINE SEQ”, but for more interesting functions, you can also make it vector to possibly add more parallelism to your code.
I don’t know anything about changes in inlining behavior - but it’s possible that the gpu compiler team realized that people were having to try to inline their procedures inside ‘do concurrent’ constructs to get them to work, and realized that they should instead just force the procedures to be acc routines instead to solve the problem. I’m not sure - but hopefully this helps resolve your issue! Let me know if there are any other issues.
Thanks, but I already have a version of the code that uses “acc routine”.
The purpose of this version is to demonstrate a large code that can offload to the GPUs with only using the Fortran standard with ZERO directives (see https://ieeexplore.ieee.org/document/10196584 for details - this is “Code 5”).
I don’t know why they would stop in-lining as it allows the use of DC (especially since they are “pure” functions).
The compiler has a flag for inlining functions which it should do regardless of where they are.
Any chance this could get fixed for the next release?
Let me check internally if this was an unexpected change in behavior and I’ll get back to you on it. If it is, I’ll open an internal bug report on it and push to get it resolved for you.
I just heard back from the GPU team about this. We believe this is a regression in behavior that was not expected. I’m going to open an internal bug report on this and try to get it resolved for you. Thanks for bringing this to our attention! If I get any updates, I’ll update this post with the information for you.