Note: what I have shown is a representative example, the real code is at the end of this reply.
Though I’m assuming you’re passing i, j, and k by reference?
The compiler doesn’t give informations on these variables.
What do the compiler feedback messages show? (i.e. add “-Minfo=accel” to your compilation).
With -Minfo=all:
802, Generating NVIDIA GPU code
803, !$acc loop gang(maxnblocks), vector(nthreads) ! blockidx%x threadidx%x
Does adding “private(i,j,k)” to the parallel compute construct fix the issue?
No, it doesn’t fix the issue.
About the reproducing example I have to say that in the same program, there is another routine similarly structured that seems to work but the compiler says:
35, Generating NVIDIA GPU code
36, !$acc loop gang(maxnblocks) ! blockidx%x
38, !$acc loop seq
39, !$acc loop seq
38, Loop carried dependence of zp,yp prevents parallelization
Loop carried backward dependence of zp,yp prevents vectorization
Loop carried dependence of xp prevents parallelization
Loop carried backward dependence of xp prevents vectorization
39, Loop carried dependence of zp,yp prevents parallelization
Loop carried backward dependence of zp,yp prevents vectorization
Loop carried dependence of xp prevents parallelization
Loop carried backward dependence of xp prevents vectorization
Loop carried dependence of conc prevents parallelization
Loop carried backward dependence of conc prevents vectorization
41, Reference argument passing prevents parallelization: ierrel
Reference argument passing prevents parallelization: restos
Reference argument passing prevents parallelization: restoy
Reference argument passing prevents parallelization: restox
Reference argument passing prevents parallelization: k
Reference argument passing prevents parallelization: j
Reference argument passing prevents parallelization: i
Reference argument passing prevents parallelization: snorm
Reference argument passing prevents parallelization: ynorm
Reference argument passing prevents parallelization: xnorm
Reference argument passing prevents parallelization: saves
Reference argument passing prevents parallelization: zint
And the routine is the following:
!$acc data present(xp, yp, zp, istatu, partyp, amass, volijk2, cgridp, &
!$acc& emip, outp, terain2, xrel2, yrel2, sgridh2, &
!$acc& ipter, nmed_calcon, ndputi,conc,maxnblocks,nthreads)
!$acc parallel loop gang vector num_gangs(maxnblocks) vector_length(1)
do p = 1,ipter
if (istatu(p).eq.1.or.istatu(p).eq.-1.or.istatu(p).eq.-2) THEN
do mat = 1,outp%nummat
do n = 1,outp%numsorg(mat)
if (partyp(p).eq.outp%vetsou(mat,n)) then
call relo3d &
(xp(p) ,yp(p) ,zp(p) ,sgridh2,cgridp%nliv,cgridp%nliv , &
terain2,cgridp%nx ,cgridp%ny ,cgridp%nx ,cgridp%ny ,cgridp%top , &
xrel2 ,yrel2 ,cgridp%dx ,cgridp%dy , &
ZINT ,SAVES ,XNORM ,YNORM ,SNORM , &
I ,J ,K , &
RESTOX,RESTOY,RESTOS, &
IERREL)
if (ierrel.eq.0.or.ierrel.eq.4) then
!$acc atomic update
conc(i,j,k,mat) = conc(i,j,k,mat) + &
amass(p,outp%specie(mat))/(volijk2(i,j,k)*nmed_calcon)
endif
endif
enddo
enddo
endif
enddo
!$acc end parallel loop
!$acc end data
I don’t have perfectly reproducing example but the real code is the following:
!$acc data present(velm(1:ipter),dtempm(1:ipter))
!$acc data copy(velc,dtempc,velcu,dtempcu,numpar) &
!$acc& present(xp,yp,zp,istatu,prisep,volijkpr,zgridpr,sgridh,terain,xrel,yrel, &
!$acc& sgridhm,xrelm,yrelm,terainm,zgrid1)
!$acc parallel loop gang vector num_gangs(maxnblocks) vector_length(nthreads)
do n=1,ipter
if (istatu(n).eq.1.or.istatu(n).eq.-1.or.istatu(n).eq.-2) then
call relo3d(xp(n) ,yp(n) ,zp(n) ,sgridh,prisep%nliv,prisep%nliv , &
terain,prisep%nx,prisep%ny,prisep%nx,prisep%ny,prisep%top, &
xrel,yrel ,prisep%dx,prisep%dy, &
adummy,bdummy,cdummy,ddummy,edummy , &
ig, jg, kg, &
fdummy,gdummy,hdummy , &
ierrel)
! outputs of relo3d are ig,jg,kg,ierrel, other variables are the inputs
if (ierrel .eq. 0 .or. ierrel .eq. 4) then
!$acc atomic update
dtempc(ig,jg,kg) = dtempc(ig,jg,kg) + dtempm(n)/volijkpr(ig,jg,kg)
!$acc end atomic
!$acc atomic update
velc(ig,jg,kg) = velc(ig,jg,kg) + velm(n)/volijkpr(ig,jg,kg)
!$acc end atomic
!$acc atomic update
numpar(ig,jg,kg) = numpar(ig,jg,kg) + 1
!$acc end atomic
endif
endif
enddo
!$acc end data
There is not repeatability on the results, probably because of race condition.
Many thanks,
Massimiliano