Hi,
By following the blog about “Deep Copy in OpenACC” at
I am able to compile and run my openacc code likes
!$acc enter data copyin(var,m%detJ, m%dxidx)
!$acc enter data copyin(m) attach(m%detJ, m%dxidx)
!$acc enter data create(vartmp)
!$acc parallel loop vector gang default(present)
do k=1,nk
do j=1,nj
do i=1,ni
vartmp(i,j,k,1) = var(i,j,k)*m%detJ(i,j,k)*m%dxidx(i,j,k)
enddo
enddo
enddo
call acc_detach(m%detJ);
call acc_detach(m%dxidx);
!$acc exit data delete(m%detJ,m%dxidx)
!$acc exit data delete(m)
!$acc exit data copyout(vartmp)
However, I get warning messages
Complex loop carried dependence of m%detj$p,vartmp,m%dxidx$p prevents parallelization
1545, Generating enter data copyin(m%detj(:,:,:),m%dxidx(:,:,:),var(:,:,:))
1546, Generating enter data attach(m%detj)
Generating enter data copyin(m)
Generating enter data attach(m%dxidx)
1547, Generating enter data create(vartmp(:,:,:,:))
1548, Accelerator kernel generated
Generating Tesla code
1549, !$acc loop gang, vector(128) ! blockidx%x threadidx%x
1550, !$acc loop seq
1551, !$acc loop seq
1548, Generating implicit present(m,vartmp(1:ni,1:nj,1:nk,1),var(:ni,:nj,:nk))
1550, Complex loop carried dependence of m%detj$p,vartmp,m%dxidx$p prevents parallelization
1551, Complex loop carried dependence of m%detj$p,vartmp,m%dxidx$p prevents parallelization
1560, Generating exit data delete(m%detj(:,:,:),m%dxidx(:,:,:))
1561, Generating exit data delete(m)
1562, Generating exit data copyout(vartmp(:,:,:,:))
The do-loop indeed run in serial,
1548: compute region reached 2505 times
1548: kernel launched 2505 times
grid: [1] block: [128]
elapsed time(us): total=8,593,687 max=3,831 min=3,274 avg=3,430
1548: data region reached 5010 times
It takes 3400 ms to run the do-loop with ni=65, nj=195 and nk=1
How can we make the do-loop to run in parallel with such deep copy ?
Thanks. /JG