Hi,
I meet a strange problem with nvfortran (nvhpc/23.7 and nvhpc/24.1 tested). Using present
clause in a kernel breaks the resolution of ieee_is_nan
call at link time:
This code does not compile:
program bide
use, intrinsic :: ieee_arithmetic
implicit none
double precision, allocatable, dimension(:) :: A
integer :: i
logical :: ok = .true.
allocate(A(1024))
A = -99.0d0
print*,ieee_is_nan(A(1))
!$acc enter data copyin(A)
!$ACC parallel loop present(A) reduction(.and.:ok)
do i=1,1024
ok=ok .and. ieee_is_nan(A(i))
enddo
end program bide
nvfortran -acc bide.f90
bide.f90:19: undefined reference to `__pgi_ieee_is_nan_dev_r8’
While this code (with implicit data movement) is working:
program bide
use, intrinsic :: ieee_arithmetic
implicit none
double precision, allocatable, dimension(:) :: A
integer :: i
logical :: ok = .true.
allocate(A(1024))
A = -99.0d0
print*,ieee_is_nan(A(1))
!$ACC parallel loop reduction(.and.:ok)
do i=1,1024
ok=ok .and. ieee_is_nan(A(i))
enddo
end program bide
(This small piece of code is inpired from a previous thread, but not showing the same problem)
Patrick
In order to use the intrinsics on the device with OpenACC, the compiler needs to inline the intrinsic. Though different presentations of the array can effect if the inlining succeeds of not, which is likely the case here. I have an open issue report for MINLOC which has a similar behavior.
The work around is to add the flag “-cuda” to enable CUDA Fortran, which has interfaces for these, thus allowing the intrinsic to get inlined correctly.
% nvfortran -acc test.F90
/usr/bin/ld: /tmp/nvfortranR-Kkeb2cW5v50.o: in function `MAIN_':
/local/home/mcolgrove/test.F90:19: undefined reference to `__pgi_ieee_is_nan_dev_r8'
pgacclnk: child process exit status 1: /usr/bin/ld
% nvfortran -acc test.F90 -cuda
%
1 Like
Hi Mat,
OK, I better understand the problem. Yes, adding the -cuda option works but this option do not seems to work with additionnal options. On my laptop:
(base) bash-4.4$ mpifort -acc -cuda bide.f90
(base) bash-4.4$ mpifort -O2 -g -acc=noautopar,gpu,host -gpu=cc75,lineinfo \
-Minfo=accel -cuda bide.f90
bide:
14, Generating enter data copyin(a(:))
17, Generating present(a(:))
Generating NVIDIA GPU code
18, !$acc loop gang, vector(128) ! blockidx%x threadidx%x
Generating reduction(.and.:ok)
17, Generating implicit copy(ok) [if not already present]
/tmp/nvfortranGSmcGJC5Z9gG.o : Dans la fonction « MAIN_ » :
/home/begou/BOULOT/YALES2/aqat-gpu/R1_ARRAYS/ENLARGE-NOKEEP/bide.f90:19 : référence indéfinie vers « __pgi_ieee_is_nan_dev_r8 »
pgacclnk: child process exit status 1: /usr/bin/ld
Any idea ?
For peole who run in the same problem with ieee_is_nan
intrinsic in a Kernel, the workaround is that by definition a NAN is not equal to itself. So it is possible to replace:
ok=ok .and. ieee_is_nan(A(i))
by
ok=ok .and. (A(i) /= A(i))
Patrick
It’s the “host” sub-option that’s interfering give CUDA Fortran can’t be applied to host code.
You’ll need to compile to only target the GPU or use your work-around.