Hi, everyone.
To get rid of the data dependency above, I had tried to use the Multigrid method instead of ILU. But I still met problems which couldn`t solve right now.
Part of the code is like this
!$acc kernels
do i = 1, NI*NJ
r(i) = c0
r0(i) = c0
p(i) = c0
yy(i) = c0
e(i) = c0
v(i) = c0
end do
c
c do i = 1, nn
c r(i) = B(i) - A(i,1) * X(i+JA1) - A(i,2) * X(i+JA2) &
c - A(i,3) * X(i) &
c - A(i,4) * X(i+JA4) - A(i,5) * X(i+JA5)
c end do
DO I=2,NIM
II=(I-1)*NJ+IJGR(L)
!$acc do private(r)
DO IJ=II+2,II+NJM
r(IJ)=QA(IJ)-APA(IJ)*FIA(IJ)-AEA(IJ)*FIA(IJ+NJ)-
END DO
END DO
c
c c1 = c0
c do i = 1, nn
c c1 = c1 + r(i) * r(i)
c end do
DO I=2,NIM
II=(I-1)*NJ+IJGR(L)
DO IJ=II+2,II+NJM
c1 = c1 + r(IJ) * r(IJ)
END DO
END DO
c bb = c0
c do i = 1, nn
c bb = bb + B(i) * B(i)
c end do
DO I=2,NIM
II=(I-1)*NJ+IJGR(L)
DO IJ=II+2,II+NJM
bb = bb + QA(IJ) * QA(IJ)
END DO
END DO
DO I=2,NIM
II=(I-1)*NJ+IJGR(L)
!$acc do private(p, r0)
DO IJ=II+2,II+NJM
p(IJ) = r(IJ)
r0(IJ) = r(IJ)
END DO
END DO
!$acc end kernels
and the compiler message seems no abnormality even though I dont know why I was supposed to add do private somewhere or don
t elsewhere.
bicgstabmg:
2085, Generating present_or_copyin(ijgr(l))
Generating present_or_copyin(ana(:))
Generating present_or_copyin(asa(:))
Generating present_or_copyin(awa(:))
Generating present_or_copyin(aea(:))
Generating present_or_copyin(apa(:))
Generating present_or_copyin(fia(:))
Generating present_or_copyin(qa(:))
Generating copyin(r(:))
Generating copyout(r(:ninj))
Generating present_or_copyout(r0(:ninj))
Generating present_or_copyout(p(:ninj))
Generating present_or_copyout(yy(:ninj))
Generating present_or_copyout(e(:ninj))
Generating present_or_copyout(v(:ninj))
Generating compute capability 2.0 binary
2086, Loop is parallelizable
Accelerator kernel generated
2086, !$acc loop gang, vector(128) ! blockidx%x threadidx%x
CC 2.0 : 12 registers; 0 shared, 108 constant, 0 local memory bytes
2100, Loop is parallelizable
Accelerator kernel generated
2100, !$acc loop gang ! blockidx%x
CC 2.0 : 24 registers; 16 shared, 144 constant, 0 local memory bytes
2103, !$acc loop vector(128) ! threadidx%x
Loop is parallelizable
2114, Loop is parallelizable
Accelerator kernel generated
2114, !$acc loop gang ! blockidx%x
CC 2.0 : 16 registers; 16 shared, 92 constant, 0 local memory bytes
2116, !$acc loop vector(128) ! threadidx%x
2117, Sum reduction generated for c1
2116, Loop is parallelizable
2125, Loop is parallelizable
Accelerator kernel generated
2125, !$acc loop gang ! blockidx%x
CC 2.0 : 16 registers; 16 shared, 92 constant, 0 local memory bytes
2127, !$acc loop vector(128) ! threadidx%x
2128, Sum reduction generated for bb
2127, Loop is parallelizable
2132, Loop is parallelizable
Accelerator kernel generated
2132, !$acc loop gang ! blockidx%x
CC 2.0 : 18 registers; 16 shared, 108 constant, 0 local memory bytes
2135, !$acc loop vector(128) ! threadidx%x
Loop is parallelizable
I am ashamed to admitted that after a lot of practice I still am an amateur