I am using pgfortran on PGI suite 19.10.
I have very-simple nested loop, which just calculates the total-num of iterations.
program main
implicit none
integer i,j,N,ans
N = 1000
ans = 0
!$acc data copy(ans) copyin(N) create(i,j)
!$acc parallel loop private(i,j) reduction(+:ans)
do i=1,N
do j=1,N
ans=ans+1
enddo
enddo
!$acc end parallel loop
!$acc end data
write(*,*) 'ans= ',ans
end program main
Now, each time, i run the program, i get a different answer!!!
Same problem if i replace the create(i,j) with copy(i,j).
BUT, if I
- remove create(i,j) on data-construct, OR
- remove private(i,j) on compute-construct, OR
- remove both of the above,
I get correct answer…
Not sure, why the complete specification (which i guess is correct, and as intended) is giving wrong answer??
Also the compiler messages (-Minfo=all) is SAME in both cases.
Pl help.
Thanks,
arun
Hi arun,
The problem is that by putting “i” and “j” in a “create” or “copy” clause, you’ve overridden the default making these scalars shared. Hence you’ll get a race condition.
Note that loop index variables are unique in that they are implicitly private hence the “private(i,j)” is ignored.
Also the compiler messages (-Minfo=all) is SAME in both cases.
Are you sure? They seem different when I compile the code:
% diff arun.F90 arunNoCreate.F90
8c8
< !$acc data copy(ans) copyin(N) create(i,j)
---
> !$acc data copy(ans) copyin(N)
% pgfortran -Minfo=accel -acc arun.F90 -o arun.out
main:
8, Generating create(j) [if not already present]
Generating copyin(n) [if not already present]
Generating copy(ans) [if not already present]
Generating create(i) [if not already present]
10, Generating Tesla code
10, Generating reduction(+:ans)
11, !$acc loop gang ! blockidx%x
12, !$acc loop vector(128) ! threadidx%x
12, Loop is parallelizable
% pgfortran -Minfo=accel -acc arunNoCreate.F90 -o arun_nc.out
main:
8, Generating copy(ans) [if not already present]
Generating copyin(n) [if not already present]
10, Generating Tesla code
10, Generating reduction(+:ans)
11, !$acc loop gang ! blockidx%x
12, !$acc loop vector(128) ! threadidx%x
12, Loop is parallelizable
-Mat
I was thinking if copy+private combination should behave as firstprivate??
Looks like it is not so.
Ok fine.
By SAME compiler messages, i meant the parallelization part (not the copy/create part), starting from: 10, Generating Tesla code
arun
Sorry, not sure why you got that impression. Variables should only be in either a copy clause or a private clause, not both.