Hello again,
I’m working on a quite complicated piece of code and trying to make it GPU-enabled. I’ve been already asking some questions about it. Right now I have a problem with a Not a Number results.
I have a loop that I want to compile and execute on the GPU:
!$acc region do local(ijk,i,j,k), copy(vvect(:,igfy:igfyp1))
do ijk=imoj4,imoj5
if(iffs.eq.0 .and. nf(ijk).ne.0) cycle
i=i_str(ijk)
j=j_str(ijk)
k=k_str(ijk)
c
include '../comdeck/mijk.f'
include '../comdeck/pijk.f'
if(wl.eq.4 .and. i.eq.iprr .and. imax.gt.4) then
i2jk=ijk_str2unstr(ii2*(k-1)+ii1*(j-1)+2+ii5)
uhalfp=-dudp(ijk)*(vvect(i2jk,igfy)-vvect(ijk,igfy))
else
uhalfp=-dudp(ijk)*(vvect(ipjk,igfy)-vvect(ijk,igfy))
endif
c
if(wl.eq.4 .and. i.eq.iprl .and. imax.gt.4) then
im2jk=ijk_str2unstr(ii2*(k-1)+ii1*(j-1)+im2+ii5)
uhalfm=-dudp(imjk)*(vvect(ijk,igfy)-vvect(im2jk,igfy))
else
uhalfm=-dudp(imjk)*(vvect(ijk,igfy)-vvect(imjk,igfy))
endif
c
if(wf.eq.4 .and. j.eq.jprbk .and. jmax.gt.4) then
ij2k=ijk_str2unstr(ii2*(k-1)+ii1+i+ii5)
vhalfp=-dvdp(ijk)*(vvect(ij2k,igfy)-vvect(ijk,igfy))
else
vhalfp=-dvdp(ijk)*(vvect(ijpk,igfy)-vvect(ijk,igfy))
endif
c
if(wf.eq.4 .and. j.eq.jprf .and. jmax.gt.4) then
ijm2k=ijk_str2unstr(ii2*(k-1)+ii1*(jm2-1)+i+ii5)
vhalfm=-dvdp(ijmk)*(vvect(ijk,igfy)-vvect(ijm2k,igfy))
else
vhalfm=-dvdp(ijmk)*(vvect(ijk,igfy)-vvect(ijmk,igfy))
endif
c
if(wb.eq.4 .and. k.eq.kprt .and. kmax.gt.4) then
ijk2=ijk_str2unstr(ii2+ii1*(j-1)+i+ii5)
whalfp=-dwdp(ijk)*(vvect(ijk2,igfy)-vvect(ijk,igfy))
else
whalfp=-dwdp(ijk)*(vvect(ijkp,igfy)-vvect(ijk,igfy))
endif
c
if(wb.eq.4 .and. k.eq.kprb .and. kmax.gt.4) then
ijkm2=ijk_str2unstr(ii2*(km2-1)+ii1*(j-1)+i+ii5)
whalfm=-dwdp(ijkm)*(vvect(ijk,igfy)-vvect(ijkm2,igfy))
else
whalfm=-dwdp(ijkm)*(vvect(ijk,igfy)-vvect(ijkm,igfy))
endif
vvect(ijk,igfyp1)=rri(i)*(rdx(i)*(afr(ijk)*uhalfp/rr(i)-
1 afr(imjk)*uhalfm/rr(i-1))+
2 rdy(j)*(afb(ijk)*vhalfp-afb(ijmk)*vhalfm))+
3 rdz(k)*(aft(ijk)*whalfp-aft(ijkm)*whalfm)
4 +vf(ijk)*rcsqf(ijk)*rdelt*vvect(ijk,igfy)
vvect(ijk,igfyp1)=vvect(ijk,igfyp1)*beta(ijk)
enddo ! (ijk)
!$acc end region
$pgf95 -DP4 -DWIN32 -c -O3 -mp -Mpreprocess -Bstatic -Mcuda -ta=nvidia -Minfo -Mfixed -V10.9 -Kieee -Ktrap-fp program.F
(...)
Generating copy(vvect(:,igfy:igfyp1))
(...)
After executing it on the GPU some elements in vvect array are NaN. They are not NaNs when the code is executed on the CPU.
The funny thing is that when I remove the copy() directive from code and leave only:
!$acc region do local(ijk,i,j,k)
The resulting array contains only zeros. It is weird because the compilator add the directive
Generating copy(vvect(:,igfy:igfyp1))
by its own, so there should not be any difference.
So, any ideas where the NaNs are comming from and why those two versions of directives gives different results?
I though about emulating the GPU and writing out all the variables in each iteration, but I understand that I can not emulate the GPU using PGI Accelerator model, right? If I could I would check all the variables that are used to compute vvect elements. So, are there other ways than moving from PGI Accelerator model to CUDA Fortran to check it?