[openACC]nvfortran minloc/maxloc became unable after update to sdk22.7, cuda11.7 + libcudaforwraprand.so error on the execution

Hello.
Before updating to nvidia hpc sdk22.7 (cuda11.7), below code used to work without bug or error.

!$acc parallel loop independent present(grpLIKEorCDForCVres, maxloc_like_grp, grpESS)
do k=1,grpamp
     maxloc_like_grp(k)=minloc(grpLIKEorCDForCVres(:,k),dim=1)

     tmp_real1 = grpLIKEorCDForCVres(maxloc_like_grp(k),k)

     grpESS(k) = tmp_real1  ! minimum cost func -> largest liklihood members value

enddo
!$acc end parallel

However, after update to sdk22.7, all the code line containing minloc or maxloc caused compiling errors like following.

0 inform, 0 warnings, 3 severes, 0 fatal for hmcpf_das_daid1
NVFORTRAN-S-1061-Procedures called in a compute region must have acc routine information - __nvf_minlocdimr_32f1d (driver/mod_letkf.f90: 6127)
NVFORTRAN-S-1061-Procedures called in a compute region must have acc routine information - __nvf_minlocdimr_32f1d (driver/mod_letkf.f90: 6298)
NVFORTRAN-S-1061-Procedures called in a compute region must have acc routine information - __nvf_minlocdimr_32f1d (driver/mod_letkf.f90: 6481)

any other way to substitue?

I normally use c++ but I developed this code to take the advantage of fortran minloc/maxloc function, which c/c+++ does not have. I wonder anyother good solution to overcome this compile error for the current version of fortran.

So many thanks in advance.

Hi halo1,

This is the CUDA Fortran device routine for minloc, though I don’t think the change in behavior from 22.5 to 22.7 is intentional given OpenACC should recognize CUDA Fortran device routines. Hence I’ve added an issue report, TPR #32298, and sent it to engineering for investigation.

I presume you have a “use cudafor” in this routine or module? If you don’t need CUDA Fortran support for other reasons, you can comment this out. In this case the minloc/maxloc calls should get inlined and the CUDA Fortran device routines not used.

For example:

% cat test.F90
module bar
contains
subroutine foo (arr,ml,sze)
#ifndef NO_CUDA
   use cudafor
#endif
   real, dimension(:) :: ml
   real, dimension(:,:) :: arr
   integer :: sze
!$acc parallel loop present(arr,ml)
   do i=1,32
     ml(i) = minloc(arr(:,i),dim=1)
   enddo
end subroutine foo
end module bar

% nvfortran -c test.F90 -acc -V22.7
NVFORTRAN-S-1061-Procedures called in a compute region must have acc routine information - __nvf_minlocdimr_32f1d (test.F90: 12)
  0 inform,   0 warnings,   1 severes, 0 fatal for foo
% nvfortran -c test.F90 -acc -V22.7 -DNO_CUDA
%

Hi halo1,

I talked with engineering and this was an intentional change in CUDA Fortran. We’ve been adding more support for calling Fortran intrinsics using CUDA Fortran device arrays from the host with “minloc” and “maxloc” being added in 22.7. It’s a good change but I’ve asked them to look into ways to ignore using these host routines within OpenACC compute regions since it hurts interoperability between the two models.

Again, if you don’t need CUDA Fortran, the work around is to remove “use cudafor”. If you do need CUDA Fortran, the work around would be to rename the CUDA Fortran routines. For example:

use cudafor, cuf_minloc => minloc, cuf_maxloc => maxloc

-Mat

Dear Mat,

Thank you so much for your very prompt support!

Yes, commenting out “use cudafor” worked to succeed the compiling! I sincerely appreciate your good support.

However, it cause following execution error, which might be caused another SDK update issue?

“”
./letkf_driver: error while loading shared libraries: libcudaforwraprand.so: cannot open shared object file: No such file or directory
“”

Again, this code run very good without error before updating SDK version update. I cannot find any of libcudaforwraprand…

P.S.
I still need to use cublas, cusolver and curand for this code. I am still using -Mcudalib=curand,cusolver,cublas for compiler option, since it is required for this code. I cannot compromise removing -Mcudalib option. but do I need to?

I still need to use cublas, cusolver and curand for this code. I am still using -Mcudalib=curand,cusolver,cublas for compiler option, since it is required for this code. I cannot compromise removing -Mcudalib option. but do I need to?

No, removing the use of the CUDA Fortran interface module shouldn’t impact the ability to use the CUDA math libraries. The only issue would be is if you’re using CUDA Fortran in the code itself, and if you were, I would expect you’d get compilation errors. Note that you can still use the module, but would just need to use the renaming trick I show above. The only problem there is if you wanted to use both minloc in the OpenACC region and the CUDA Fortran minloc in the same compute unit.

./letkf_driver: error while loading shared libraries: libcudaforwraprand.so: cannot open shared object file: No such file or directory
Again, this code run very good without error before updating SDK version update. I cannot find any of libcudaforwraprand…

The libcuaforwrapand.so library is in the “<base_dir>/Linux_x86_64/22.7/compilers/lib” directory.

I’m not sure what the “letkf_driver” is, but if it’s your program, try setting the environment variable LD_LIBRARY_PATH to include this directory. If it’s wrapper script, check it’s environment variables and update the LD_LIBRARY_PATH.

-Mat

Thank you very much! I found a typo in bashrc for that path! now all problems seemed to be solved.