Hello Mat.
I have some question about openACC
I added the code, compile message, result.
I usually use this compile option.
“ACC = -fast -acc -Minfo=accel -ta=tesla:cc80”
but when i want to compare the result between cpu / gpu used above option.
“ACC = -fast -acc -Minfo=accel -ta=tesla:autocompare”
Q1.
I don’t know why i have to use present clause, because i already use !$acc enter data copyin RHS1 .
In my opnions i dont need use present clause at parallel loop. Is it right?
When i doensn’t use present clause, it looks RHS1 didn’t copy the parallel loop.
make copy because of enter data copy regions but didn’t copy at parallel loop.
Generating enter data copyin(rhs1(
0
:m1,
0
:m2,
0
:m3,
3
))
Q2.
As you can see the Sample1, Sample2 code only different at RHS1
Sample1
!$acc parallel loop collapse(2) present(RHS1, AK, AKUV, TNUY, FIXKL, CK, CKUV, FIXKU, BK, GK)
Sample2
!$acc parallel loop collapse(2) copy(RHS1) present(AK, AKUV, TNUY, FIXKL, CK, CKUV, FIXKU,
why the result is different?
Sample 1 code
!$acc enter data copyin ( AK(1:M1, 1:M3), BK(1:M1, 1:M3), CK(1:M1, 1:M3), GK(1:M1, 1:M3) )
!$acc enter data copyin ( AI(1:M2, 1:M1), BI(1:M2, 1:M1), CI(1:M2, 1:M1), GI(1:M2, 1:M1) )
!$acc enter data copyin ( AJ(1:M1, 1:M2), BJ(1:M1, 1:M2), CJ(1:M1, 1:M2), GJ(1:M1, 1:M2) )
!$acc enter data copyin ( RHS1(0:M1, 0:M2, 0:M3, 3) )
!$acc enter data copyin ( FIXIL(1:M1M), FIXIU(1:M1M), FIXJL(1:M2M), FIXJU(1:M2M), FIXKL(1:M3M), FIXKU(1:M3M) )
!$acc enter data copyin ( TNU (0:M1, 0:M2, 0:M3), TNUX(0:M1, 0:M2, 0:M3), TNUY(0:M1, 0:M2, 0:M3), TNUZ(0:M1, 0:M2, 0:M3) )
!$acc enter data copyin ( AKW(1:M3), CKW(1:M3), AKUV(1:M3), CKUV(1:M3) )
!$acc enter data copyin ( XMP(0:M1), YMP(0:M2), ZMP(0:M3) )
!$acc enter data copyin ( X(M1), Y(M2), Z(M3) )
!=====ADI STARTS
if (N3M /= 1) then
!-----Z-DIRECTION
do J=1,N2M
!$acc parallel loop collapse(2) present(RHS1, AK, AKUV, TNUY, FIXKL, CK, CKUV, FIXKU, BK, GK)
do K=1,N3M
do I=I_BGPX,N1M
AK(I,K)=AKUV(K)*(1.+CRE*TNUY(I,J,K)) *(1.-FIXKL(K)*FLOAT(KUB))
CK(I,K)=CKUV(K)*(1.+CRE*TNUY(I,J,K+1))*(1.-FIXKU(K)*FLOAT(KUT))
IF (IVELSRC .EQ. 1) THEN
PRIK=0.
BK(I,K)=ACOEFI*(1+ACOEF*PRIK)-AK(I,K)-CK(I,K)
GK(I,K)=ACOEFI*(1+ACOEF*PRIK)*RHS1(I,J,K,1)
ELSE
BK(I,K)=ACOEFI-AK(I,K)-CK(I,K)
GK(I,K)=ACOEFI*RHS1(I,J,K,1)
ENDIF
enddo
enddo
!$acc end loop
enddo
endif
Sample 1 - compile message
lhsu:
1783, Generating enter data copyin(ck(1:m1,1:m3),bk(1:m1,1:m3),ak(1:m1,1:m3),gk(1:m1,1:m3))
1784, Generating enter data copyin(ci(1:m2,1:m1),bi(1:m2,1:m1),ai(1:m2,1:m1),gi(1:m2,1:m1))
1785, Generating enter data copyin(cj(1:m1,1:m2),bj(1:m1,1:m2),aj(1:m1,1:m2),gj(1:m1,1:m2))
1788, Generating enter data copyin(rhs1(0:m1,0:m2,0:m3,3))
1789, Generating enter data copyin(fixkl(1:m3m),fixju(1:m2m),fixjl(1:m2m),fixiu(1:m1m),fixil(1:m1m),fixku(1:m3m))
1790, Generating enter data copyin(tnuy(0:m1,0:m2,0:m3),tnux(0:m1,0:m2,0:m3),tnu(0:m1,0:m2,0:m3),tnuz(0:m1,0:m2,0:m3))
1791, Generating enter data copyin(akw(1:m3),ckuv(1:m3),akuv(1:m3),ckw(1:m3))
1792, Generating enter data copyin(ymp(0:m2),xmp(0:m1),zmp(0:m3))
1793, Generating enter data copyin(z(m3),y(m2),x(m1))
1799, Generating present(gk(:,:),ak(:,:),tnuy(:,:,:),rhs1(:,:,:,:),ckuv(:),bk(:,:),ck(:,:),fixkl(:),akuv(:),fixku(:))
Generating NVIDIA GPU code
1800, !$acc loop gang, vector(128) collapse(2) ! blockidx%x threadidx%x
1801, ! blockidx%x threadidx%x collapsed
1898, Generating exit data delete(ck(1:m1,1:m3),bk(1:m1,1:m3),ak(1:m1,1:m3),gk(1:m1,1:m3))
1899, Generating exit data delete(ci(1:m2,1:m1),bi(1:m2,1:m1),ai(1:m2,1:m1),gi(1:m2,1:m1))
1900, Generating exit data delete(cj(1:m1,1:m2),bj(1:m1,1:m2),aj(1:m1,1:m2),gj(1:m1,1:m2))
1902, Generating exit data copyout(rhs1(0:m1,0:m2,0:m3,3))
1903, Generating exit data copyout(fixkl(1:m3m),fixju(1:m2m),fixjl(1:m2m),fixiu(1:m1m),fixil(1:m1m),fixku(1:m3m))
1904, Generating exit data copyout(tnuy(0:m1,0:m2,0:m3),tnux(0:m1,0:m2,0:m3),tnu(0:m1,0:m2,0:m3),tnuz(0:m1,0:m2,0:m3))
1905, Generating exit data copyout(akw(1:m3),ckuv(1:m3),akuv(1:m3),ckw(1:m3))
1906, Generating exit data copyout(ymp(0:m2),xmp(0:m1),zmp(0:m3))
1907, Generating exit data copyout(z(m3),y(m2),x(m1))
Sample 1 - result
rhs1 lives at 0x145f6d956c80 size 156558528 partially present
Present table dump for device[1]: NVIDIA Tesla GPU 0, compute capability 8.0, threadid=1
host:0x768410 device:0x145ea39fb200 size:128 presentcount:0+1 line:1793 name:descriptor
host:0x7684a0 device:0x145ea39fb600 size:128 presentcount:0+1 line:1793 name:descriptor
host:0x768530 device:0x145ea39fba00 size:128 presentcount:0+1 line:1793 name:descriptor
host:0x768920 device:0x145ea33fce00 size:128 presentcount:0+1 line:1789 name:descriptor
host:0x7689b0 device:0x145ea33fd800 size:128 presentcount:0+1 line:1789 name:descriptor
host:0x768a40 device:0x145ea33fde00 size:128 presentcount:0+1 line:1789 name:descriptor
host:0x768ad0 device:0x145ea33fe400 size:128 presentcount:0+1 line:1789 name:descriptor
host:0x768b60 device:0x145ea33fee00 size:128 presentcount:0+1 line:1789 name:descriptor
host:0x768bf0 device:0x145ea33ff800 size:128 presentcount:0+1 line:1789 name:descriptor
host:0x769340 device:0x145ea39f9c00 size:128 presentcount:0+1 line:1792 name:descriptor
host:0x7693d0 device:0x145ea39fa200 size:128 presentcount:0+1 line:1792 name:descriptor
host:0x769460 device:0x145ea39fae00 size:128 presentcount:0+1 line:1792 name:descriptor
host:0x76a150 device:0x145ea39f6c00 size:128 presentcount:0+1 line:1791 name:descriptor
host:0x76a1e0 device:0x145ea39f7800 size:128 presentcount:0+1 line:1791 name:descriptor
host:0x76a270 device:0x145ea39f8400 size:128 presentcount:0+1 line:1791 name:descriptor
host:0x76a300 device:0x145ea39f9000 size:128 presentcount:0+1 line:1791 name:descriptor
host:0x76e4d0 device:0x145ea33fc400 size:272 presentcount:0+1 line:1788 name:descriptor
host:0x76f4f0 device:0x145ea33ffa00 size:224 presentcount:0+1 line:1790 name:descriptor
host:0x76f5e0 device:0x145ea33ffc00 size:224 presentcount:0+1 line:1790 name:descriptor
host:0x76f6d0 device:0x145ea33ffe00 size:224 presentcount:0+1 line:1790 name:descriptor
host:0x76f7c0 device:0x145ea39f6000 size:224 presentcount:0+1 line:1790 name:descriptor
host:0x1f0df60 device:0x145ea39fb000 size:8 presentcount:0+1 line:1793 name:x
host:0x1f0e290 device:0x145ea39fb400 size:8 presentcount:0+1 line:1793 name:y
host:0x1f0eac0 device:0x145ea39fb800 size:8 presentcount:0+1 line:1793 name:z
host:0x1f11210 device:0x145ea33fc600 size:2048 presentcount:0+1 line:1789 name:fixil
host:0x1f11a40 device:0x145ea33fd000 size:2048 presentcount:0+1 line:1789 name:fixiu
host:0x1f12270 device:0x145ea33fda00 size:768 presentcount:0+1 line:1789 name:fixjl
host:0x1f125a0 device:0x145ea33fe000 size:768 presentcount:0+1 line:1789 name:fixju
host:0x1f128d0 device:0x145ea33fe600 size:2048 presentcount:0+1 line:1789 name:fixkl
host:0x1f13100 device:0x145ea33ff000 size:2048 presentcount:0+1 line:1789 name:fixku
host:0x1f187d0 device:0x145ea39f9200 size:2064 presentcount:0+1 line:1792 name:xmp
host:0x1f19010 device:0x145ea39f9e00 size:784 presentcount:0+1 line:1792 name:ymp
host:0x1f19350 device:0x145ea39fa400 size:2064 presentcount:0+1 line:1792 name:zmp
host:0x1f1c910 device:0x145ea39f6200 size:2056 presentcount:0+1 line:1791 name:akw
host:0x1f1d140 device:0x145ea39f6e00 size:2056 presentcount:0+1 line:1791 name:ckw
host:0x1f1d970 device:0x145ea39f7a00 size:2056 presentcount:0+1 line:1791 name:akuv
host:0x1f1e1a0 device:0x145ea39f8600 size:2056 presentcount:0+1 line:1791 name:ckuv
host:0x1f31ee0 device:0x145ea32fa000 size:528392 presentcount:0+1 line:1783 name:ak
host:0x1fb89d0 device:0x145ea337b200 size:528392 presentcount:0+1 line:1783 name:bk
host:0x203f6d0 device:0x145ea3800000 size:528392 presentcount:0+1 line:1783 name:ck
host:0x20c65e0 device:0x145ea3881200 size:528392 presentcount:0+1 line:1783 name:gk
host:0x2147610 device:0x145ea3902400 size:199432 presentcount:0+1 line:1784 name:ai
host:0x2178140 device:0x145ea3933000 size:199432 presentcount:0+1 line:1784 name:bi
host:0x21a8c70 device:0x145ea3963c00 size:199432 presentcount:0+1 line:1784 name:ci
host:0x21d97a0 device:0x145ea3994800 size:199432 presentcount:0+1 line:1784 name:gi
host:0x220a2d0 device:0x145ea39c5400 size:199432 presentcount:0+1 line:1785 name:aj
host:0x223ae00 device:0x145ea3a00000 size:199432 presentcount:0+1 line:1785 name:bj
host:0x226b930 device:0x145ea3a30c00 size:199432 presentcount:0+1 line:1785 name:cj
host:0x229c460 device:0x145ea3a61800 size:199432 presentcount:0+1 line:1785 name:gj
host:0x145f6123e4c0 device:0x145e76000000 size:52186176 presentcount:0+1 line:1790 name:tnuz
host:0x145f644052b0 device:0x145e7a000000 size:52186176 presentcount:0+1 line:1790 name:tnuy
host:0x145f675cb0a0 device:0x145e7e000000 size:52186176 presentcount:0+1 line:1790 name:tnux
host:0x145f6a790e90 device:0x145e82000000 size:52186176 presentcount:0+1 line:1790 name:tnu
host:0x145f73ce0500 device:0x145e86000000 size:52186176 presentcount:0+1 line:1788 name:rhs1
allocated block device:0x145e76000000 size:52186624 thread:1
allocated block device:0x145e7a000000 size:52186624 thread:1
allocated block device:0x145e7e000000 size:52186624 thread:1
allocated block device:0x145e82000000 size:52186624 thread:1
allocated block device:0x145e86000000 size:52186624 thread:1
allocated block device:0x145ea32fa000 size:528896 thread:1
allocated block device:0x145ea337b200 size:528896 thread:1
allocated block device:0x145ea33fc400 size:512 thread:1
allocated block device:0x145ea33fc600 size:2048 thread:1
allocated block device:0x145ea33fce00 size:512 thread:1
allocated block device:0x145ea33fd000 size:2048 thread:1
allocated block device:0x145ea33fd800 size:512 thread:1
allocated block device:0x145ea33fda00 size:1024 thread:1
allocated block device:0x145ea33fde00 size:512 thread:1
allocated block device:0x145ea33fe000 size:1024 thread:1
allocated block device:0x145ea33fe400 size:512 thread:1
allocated block device:0x145ea33fe600 size:2048 thread:1
allocated block device:0x145ea33fee00 size:512 thread:1
allocated block device:0x145ea33ff000 size:2048 thread:1
allocated block device:0x145ea33ff800 size:512 thread:1
allocated block device:0x145ea33ffa00 size:512 thread:1
allocated block device:0x145ea33ffc00 size:512 thread:1
allocated block device:0x145ea33ffe00 size:512 thread:1
allocated block device:0x145ea3800000 size:528896 thread:1
allocated block device:0x145ea3881200 size:528896 thread:1
allocated block device:0x145ea3902400 size:199680 thread:1
allocated block device:0x145ea3933000 size:199680 thread:1
allocated block device:0x145ea3963c00 size:199680 thread:1
allocated block device:0x145ea3994800 size:199680 thread:1
allocated block device:0x145ea39c5400 size:199680 thread:1
allocated block device:0x145ea39f6000 size:512 thread:1
allocated block device:0x145ea39f6200 size:2560 thread:1
allocated block device:0x145ea39f6c00 size:512 thread:1
allocated block device:0x145ea39f6e00 size:2560 thread:1
allocated block device:0x145ea39f7800 size:512 thread:1
allocated block device:0x145ea39f7a00 size:2560 thread:1
allocated block device:0x145ea39f8400 size:512 thread:1
allocated block device:0x145ea39f8600 size:2560 thread:1
allocated block device:0x145ea39f9000 size:512 thread:1
allocated block device:0x145ea39f9200 size:2560 thread:1
allocated block device:0x145ea39f9c00 size:512 thread:1
allocated block device:0x145ea39f9e00 size:1024 thread:1
allocated block device:0x145ea39fa200 size:512 thread:1
allocated block device:0x145ea39fa400 size:2560 thread:1
allocated block device:0x145ea39fae00 size:512 thread:1
allocated block device:0x145ea39fb000 size:512 thread:1
allocated block device:0x145ea39fb200 size:512 thread:1
allocated block device:0x145ea39fb400 size:512 thread:1
allocated block device:0x145ea39fb600 size:512 thread:1
allocated block device:0x145ea39fb800 size:512 thread:1
allocated block device:0x145ea39fba00 size:512 thread:1
allocated block device:0x145ea3a00000 size:199680 thread:1
allocated block device:0x145ea3a30c00 size:199680 thread:1
allocated block device:0x145ea3a61800 size:199680 thread:1
FATAL ERROR: variable in data clause is partially present on the device: name=rhs1
file:/home/jsera.lee/lica/LICA/Canopy/3_Main/_modi_main/src/lica.f90 lhsu line:1799
Sample 2 - code
!$acc enter data copyin ( AK(1:M1, 1:M3), BK(1:M1, 1:M3), CK(1:M1, 1:M3), GK(1:M1, 1:M3) )
!$acc enter data copyin ( AI(1:M2, 1:M1), BI(1:M2, 1:M1), CI(1:M2, 1:M1), GI(1:M2, 1:M1) )
!$acc enter data copyin ( AJ(1:M1, 1:M2), BJ(1:M1, 1:M2), CJ(1:M1, 1:M2), GJ(1:M1, 1:M2) )
!!$acc enter data copyin ( RHS1(0:M1, 0:M2, 0:M3, 3) )
!$acc enter data copyin ( FIXIL(1:M1M), FIXIU(1:M1M), FIXJL(1:M2M), FIXJU(1:M2M), FIXKL(1:M3M), FIXKU(1:M3M) )
!$acc enter data copyin ( TNU (0:M1, 0:M2, 0:M3), TNUX(0:M1, 0:M2, 0:M3), TNUY(0:M1, 0:M2, 0:M3), TNUZ(0:M1, 0:M2, 0:M3) )
!$acc enter data copyin ( AKW(1:M3), CKW(1:M3), AKUV(1:M3), CKUV(1:M3) )
!$acc enter data copyin ( XMP(0:M1), YMP(0:M2), ZMP(0:M3) )
!$acc enter data copyin ( X(M1), Y(M2), Z(M3) )
!=====ADI STARTS
if (N3M /= 1) then
!-----Z-DIRECTION
do J=1,N2M
!$acc parallel loop collapse(2) copy(RHS1) present(AK, AKUV, TNUY, FIXKL, CK, CKUV, FIXKU, BK, GK)
do K=1,N3M
do I=I_BGPX,N1M
AK(I,K)=AKUV(K)*(1.+CRE*TNUY(I,J,K)) *(1.-FIXKL(K)*FLOAT(KUB))
CK(I,K)=CKUV(K)*(1.+CRE*TNUY(I,J,K+1))*(1.-FIXKU(K)*FLOAT(KUT))
IF (IVELSRC .EQ. 1) THEN
!PRIK=PERI(X(I),YMP(J),ZMP(K))
PRIK=0.
BK(I,K)=ACOEFI*(1+ACOEF*PRIK)-AK(I,K)-CK(I,K)
GK(I,K)=ACOEFI*(1+ACOEF*PRIK)*RHS1(I,J,K,1)
ELSE
BK(I,K)=ACOEFI-AK(I,K)-CK(I,K)
GK(I,K)=ACOEFI*RHS1(I,J,K,1)
ENDIF
enddo
enddo
!$acc end loop
enddo
endif
Sample 2 - compile message
lhsu:
1783, Generating enter data copyin(ck(1:m1,1:m3),bk(1:m1,1:m3),ak(1:m1,1:m3),gk(1:m1,1:m3))
1784, Generating enter data copyin(ci(1:m2,1:m1),bi(1:m2,1:m1),ai(1:m2,1:m1),gi(1:m2,1:m1))
1785, Generating enter data copyin(cj(1:m1,1:m2),bj(1:m1,1:m2),aj(1:m1,1:m2),gj(1:m1,1:m2))
1789, Generating enter data copyin(fixkl(1:m3m),fixju(1:m2m),fixjl(1:m2m),fixiu(1:m1m),fixil(1:m1m),fixku(1:m3m))
1790, Generating enter data copyin(tnuy(0:m1,0:m2,0:m3),tnux(0:m1,0:m2,0:m3),tnu(0:m1,0:m2,0:m3),tnuz(0:m1,0:m2,0:m3))
1791, Generating enter data copyin(akw(1:m3),ckuv(1:m3),akuv(1:m3),ckw(1:m3))
1792, Generating enter data copyin(ymp(0:m2),xmp(0:m1),zmp(0:m3))
1793, Generating enter data copyin(y(m2),z(m3),x(m1))
1799, Generating copy(rhs1(:,:,:,:)) [if not already present]
Generating present(ak(:,:),tnuy(:,:,:),gk(:,:),ckuv(:),bk(:,:),ck(:,:),fixkl(:),akuv(:),fixku(:))
Generating NVIDIA GPU code
1800, !$acc loop gang, vector(128) collapse(2) ! blockidx%x threadidx%x
1801, ! blockidx%x threadidx%x collapsed
1898, Generating exit data delete(ck(1:m1,1:m3),bk(1:m1,1:m3),ak(1:m1,1:m3),gk(1:m1,1:m3))
1899, Generating exit data delete(ci(1:m2,1:m1),bi(1:m2,1:m1),ai(1:m2,1:m1),gi(1:m2,1:m1))
1900, Generating exit data delete(cj(1:m1,1:m2),bj(1:m1,1:m2),aj(1:m1,1:m2),gj(1:m1,1:m2))
1903, Generating exit data copyout(fixkl(1:m3m),fixju(1:m2m),fixjl(1:m2m),fixiu(1:m1m),fixil(1:m1m),fixku(1:m3m))
1904, Generating exit data copyout(tnuy(0:m1,0:m2,0:m3),tnux(0:m1,0:m2,0:m3),tnu(0:m1,0:m2,0:m3),tnuz(0:m1,0:m2,0:m3))
1905, Generating exit data copyout(akw(1:m3),ckuv(1:m3),akuv(1:m3),ckw(1:m3))
1906, Generating exit data copyout(ymp(0:m2),xmp(0:m1),zmp(0:m3))
1907, Generating exit data copyout(z(m3),y(m2),x(m1))
Sample 2 - result
no error, pass without issues