I’m using the Linux version of pgf90 version 15.7. Doing print statements inside a device routine reveals that a do-loop inside of it is not getting executed correctly, i.e. printing inside the routine only happens once instead of 5 times, printing the loop iterator after “end do” shows that the iterator hasn’t been initialized. I’m afraid my example isn’t yet minimal, but maybe you’ve seen such a behavior anyways. I remember that I was forced to manually inline such routines in the past to get them to work - is there anything I’m doing wrong or is just compiler bugs that still need to be fixed? Thanks in advance. Let me know if you need a standalone minimal example to reproduce it.
…code
subroutine rad_jma1206_zenith_run(nx, ny, ishrt, clat, clon, totalsec, sindel, cosdel, etime, zmean, ztemp)
use openacc
use cudafor
use rad_const, only: timestep
use rad_parm, only: dtrads
use pp_vardef
integer(4), intent(in) :: nx, ny
integer(4), intent(in) :: ishrt
real(r_size), intent(in) :: clat(nx,ny)
real(r_size), intent(in) :: clon(nx,ny)
real(r_size), intent(in) :: totalsec
real(r_size), intent(in) :: sindel
real(r_size), intent(in) :: cosdel
real(r_size), intent(in) :: etime
real(r_size), intent(inout) :: zmean(nx,ny)
real(r_size), intent(inout) :: ztemp(nx,ny)
real(8) :: hf_output_temp
integer(4) :: i, j
!$acc kernels present(zmean) present(clon) present(ztemp) present(clat)
!$acc loop independent vector(16)
do j=1,ny
!$acc loop independent vector(16)
do i=1,nx
if (ishrt > 0) then
if (i == 1 .and. j == 1) then
print *, "rad_jma1206_zenith_run print", ishrt, totalsec, sindel, cosdel, etime
end if
call rad_zenith_update_zmean(i, j, cpie, dtrads, timestep, pai12, pai432, hour_ini, clat(i,j), clon(i,j), totalsec, sindel, cosdel, &
& etime, zmean(i,j))
end if
call rad_zenith_everystep(cpie, pai12, pai432, hour_ini, clat(i,j), clon(i,j), totalsec, sindel, cosdel, etime, ztemp(i,j))
end do
end do
!$acc end kernels
end subroutine rad_jma1206_zenith_run
!$acc routine seq
subroutine rad_zenith_update_zmean(i, j, cpie, dtrads, timestep, pai12, pai432, hour_ini, clat, clon, totalsec, sindel, cosdel, etime, &
& zmean)
use openacc
use cudafor
use pp_vardef
implicit none
integer(4), intent(in) :: i, j
real(r_size), intent(in) :: cpie, dtrads, timestep, pai12, pai432
integer(4), intent(in) :: hour_ini
real(r_size), intent(in) :: clat
real(r_size), intent(in) :: clon
real(r_size), intent(in) :: totalsec
real(r_size), intent(in) :: sindel
real(r_size), intent(in) :: cosdel
real(r_size), intent(in) :: etime
real(r_size), intent(out) :: zmean
integer(4) :: kt0
integer(4) :: nrdstp
integer(4) :: nstp
real(r_size) :: cosclt
real(r_size) :: sinclt
real(r_size) :: sumn
real(r_size) :: sumcos
real(r_size) :: sc
real(r_size) :: cs
real(r_size) :: ctime
real(r_size) :: btime
real(r_size) :: atime
real(r_size) :: tcosz
cosclt = sin(clat * cpie)
sinclt = cos(clat * cpie)
kt0 = nint(totalsec / dtrads + 0.01)
nrdstp = int((dtrads * (kt0 + 1) - totalsec) / timestep - 0.001) + 1
sumn = 0.d0
sumcos = 0.d0
ctime = etime + pai12 * (hour_ini - 12) + pai432 * totalsec
do nstp = 1, nrdstp
btime = pai432 * timestep * float(nstp - 1)
sc = sindel * cosclt
cs = cosdel * sinclt
atime = ctime + clon * cpie + btime
tcosz = sc + cs * cos(atime)
if (tcosz > 0.01) then
sumcos = sumcos + tcosz
sumn = sumn + 1.0
end if
end do
if (i == 1 .and. j == 1) then
print *, "rad_zenith_update_zmean print", sumcos, sumn, ctime, nrdstp, sc, cs, nstp
end if
if (sumn >= 1.0) then
zmean = max(0.01_r_size, sumcos / sumn)
else
zmean = 0.0
end if
return
end subroutine rad_zenith_update_zmean
…compiling
..........compiling rad_zenith.f90 in /home0/usr4/mueller-m-ab/physlib/hybrid/pp/build/gpu/src
pgf90 -g -O0 -Mchkptr -Mbounds -Kieee -Minfo=accel,inline,ipa -Mneginfo -Minform=inform -Mmpi=mpich -acc -Mcuda=6.5,cc3x -ta=tesla:cc3x,keepgpu,keepbin,time -Minline=levels:5,reshape -DGPU -DGPU -c rad_zenith.f90 -o rad_zenith.o
pgf90-Warning-CUDA Fortran or OpenACC GPU targets disables -Mbounds
rad_zenith_run:
142, rad_jma1206_zenith_run inlined, size=50, file rad_zenith.f90 (161)
142, Loop is parallelizable
Generating present(..inline(:,:))
Accelerator kernel generated
Generating Tesla code
142, !$acc loop gang, vector(16) ! blockidx%y threadidx%y
!$acc loop gang, vector(16) ! blockidx%x threadidx%x
191, rad_zenith_update_zmean inlined, size=38, file rad_zenith.f90 (201)
142, Accelerator restriction: induction variable live-out from loop: ..inline
Scalar last value needed after loop for sindel*e,cosdel*e at line 142
194, rad_zenith_everystep inlined, size=9, file rad_zenith.f90 (262)
146, Generating update host(clon(:1,:1),zmean(:1,:1),clat(:1,:1),ztemp(:1,:1))
rad_jma1206_zenith_run:
182, Generating present(zmean(:,:),clon(:,:),ztemp(:,:),clat(:,:))
184, Loop is parallelizable
186, Loop is parallelizable
Accelerator kernel generated
Generating Tesla code
184, !$acc loop gang, vector(16) ! blockidx%y threadidx%y
186, !$acc loop gang, vector(16) ! blockidx%x threadidx%x
191, rad_zenith_update_zmean inlined, size=38, file rad_zenith.f90 (201)
191, Accelerator restriction: induction variable live-out from loop: ..inline
Scalar last value needed after loop for sindel*e,cosdel*e at line 191
194, rad_zenith_everystep inlined, size=9, file rad_zenith.f90 (262)
…running
rad_jma1206_zenith_run print 1 0.000000000000000
-0.3913847319351452 0.9202271413124341 -1.3131663878269148E-002
rad_zenith_update_zmean print 14357494.04123566 42190967.00000000
-3.154724317468062 5 -0.2244890597855700
0.7538059440162953 103660017
please note the last printed variable nstp, which should IMO be set equal to nrdstp at this point, but it looks to be not initialized. On the CPU the same code works fine. [/quote]