Hi,
The results between the CPU results and openACC GPU results are not matching to each other. Both executables are compiled with PGI-15.4 Fortran compiler and with -Kieee option. The vimdiff output of the same is copied below:
L, CFL[xyz]max: 1 0.5683299913571619 | L, CFL[xyz]max: 1 0.5683299913571619
0.1273464758709**242** 0.000000000000000 | 0.1273464758709**578** 0.000000000000000
L, CFL[xyz]max: 2 0.6531875907627893 | L, CFL[xyz]max: 2 0.6531875907627893
0.1595491883002**556** 9.1393459670539370E-002 | 0.1595491883002**926** 9.1393459670539370E-002
L, CFL[xyz]max: 3 0.6136218439579227 | L, CFL[xyz]max: 3 0.6136218439579227
0.13731536255**29625** 0.1561959406772873 | 0.13731536255**30031** 0.1561959406772873
L, CFL[xyz]max: 4 0.5379380549130**823** | L, CFL[xyz]max: 4 0.5379380549130**339**
0.1341207044526441 0.1989663002452234 | 0.1341207044526441 0.1989663002452234
L, CFL[xyz]max: 5 0.57548155648**40166** | L, CFL[xyz]max: 5 0.5754815564839401
The code of the applied OpenACC pragmas is pasted below:
DO j = 1, numgbr
inoutf = cldon*(j-1)
!$acc parallel firstprivate(inoutf)
!$acc loop
DO i = 1, cldon
zebfrcre(i) = frcre(inoutf+i)
! zebfrcre(i) = 1. !!!!!! essai MPL 19052010
zerm0(i) = rm0(inoutf+i)
PHODI(i,1) = alumin1(inoutf+i)
PHODI(i,2) = alumin2(inoutf+i)
!
PHODI_NEW(i,1) = alumin1(inoutf+i) !!!!! A REVOIR (MPL) PHODI_NEW en fonction bdes SW
do kk=2,NSW
PHODI_NEW(i,kk) = alumin2(inoutf+i)
enddo
PBLPRE(i,1) = alumin1(inoutf+i)
PBLPRE(i,2) = alumin2(inoutf+i)
!
PBLPRE_NEW(i,1) = alumin1(inoutf+i) !!!!! A REVOIR (MPL) PBLPRE_NEW en fonction bdes SW
do kk=2,NSW
PBLPRE_NEW(i,kk) = alumin2(inoutf+i)
enddo
PASSIM(i) = 1.0 !!!!! A REVOIR (MPL)
PRLVW(i) = 1.66
PPSOL(i) = PAHALE(inoutf+i,1)
zeroxa1 = (PAHALE(inoutf+i,1)-pplay(inoutf+i,2))/(pplay(inoutf+i,1)-pplay(inoutf+i,2))
zeroxa2 = 1.0 - zeroxa1
PRLTAI(i,1) = t(inoutf+i,1) * zeroxa1 + t(inoutf+i,2) * zeroxa2
PRLTAI(i,KLEV+1) = t(inoutf+i,KLEV)
PDT0(i) = tsol(inoutf+i) - PRLTAI(i,1)
ENDDO
!$acc end loop
!$acc end parallel
I’ve checked the PGIUG-15.4 to know some more options to produce same floating point accuracy on both CPU and GPU. But didn’t find much relevant options. Can you please guide further here to produce accurate results.