I have found a confusing example about the private a fortran array. Here is the code
subroutine cal(a, N, M)
implicit none
integer :: N, M
real(8) :: a(N, M)
real(8) :: tmp(N)
integer :: j, i
!$acc data create(tmp), copy(a)
!$acc parallel loop private(tmp)
do j = 1, M
!$acc loop private(tmp)
do i = 1, N
tmp(i) = 3
end do
!$acc loop private(tmp)
do i = 1, N
tmp(i) = 4
end do
a(:, j) = tmp
end do
!$acc end data
end subroutine
“tmp” is a temporary array, and I want it is private for each j loop. This example is simplified to see the problem more clearly, that is, the second loop “tmp(i) = 4” may contain more complex calculations, e.g. “tmp(i) = tmp(i)+b(i, j)”. The results show that some parts of “a” are 4 but some are 3, which is incorrect. However, if “tmp” is declared as a fixed size array, then all of “a” are 4, and the results become right.
integer, parameter :: N1 = 30
real(8) :: tmp(N1)
By adding “-Minfo=accel” to the compiler, I can see that for fixed-size “tmp”, a line is showing
Local memory used for tmp
CUDA shared memory used for tmp
It seems that in the unfixed-size case, “tmp” is not using local shared memory? How to ensure the results is right if “tmp” can not be fixed-size array? Attached are the code for tests. Thank you in advance.
test1.f90 (625 Bytes)