Confusion about wrong results from the OpenAcc private clause of a fortran array

I have found a confusing example about the private a fortran array. Here is the code

subroutine cal(a, N, M)
    implicit none
    integer :: N, M
    real(8) :: a(N, M)
    real(8) :: tmp(N)
    integer :: j, i

!$acc data create(tmp), copy(a)
!$acc parallel loop private(tmp)
    do j = 1, M
!$acc loop private(tmp)
        do i = 1, N
            tmp(i) = 3
        end do
!$acc loop private(tmp)
        do i = 1, N
            tmp(i) = 4
        end do
        a(:, j) = tmp
    end do
!$acc end data
end subroutine

“tmp” is a temporary array, and I want it is private for each j loop. This example is simplified to see the problem more clearly, that is, the second loop “tmp(i) = 4” may contain more complex calculations, e.g. “tmp(i) = tmp(i)+b(i, j)”. The results show that some parts of “a” are 4 but some are 3, which is incorrect. However, if “tmp” is declared as a fixed size array, then all of “a” are 4, and the results become right.

integer, parameter :: N1 = 30
real(8) :: tmp(N1)

By adding “-Minfo=accel” to the compiler, I can see that for fixed-size “tmp”, a line is showing

Local memory used for tmp
CUDA shared memory used for tmp

It seems that in the unfixed-size case, “tmp” is not using local shared memory? How to ensure the results is right if “tmp” can not be fixed-size array? Attached are the code for tests. Thank you in advance.
test1.f90 (625 Bytes)

Correct. Given that shared memory is fixed size, the compiler can only use it if the size of the private array is known at compile time and if it fits. Otherwise, it could lead to run time errors.

Note that in your program, you have “tmp” in both a data directive as well as a private clause. A variable can’t be in both given it can’t both be global and private. You should remove it from the data directive.

Also, you shouldn’t put the private clause on the inner loops. This says to make tmp private to those loops, but here, you want it shared within the inner loops and only private to the outer loop.

Thanks. I have got the right results following your advice.