OpenACC directive "acc parallel"

Hello,
I’ve installed the new compiler version 12.5.
I have a standard Jacobi iterative method program.
I use the OpenACC directive set.
The program freezes at every run. If I change the “acc parallel” directive to “acc region”, everything is ok.
Here is the source code of Jacobi using “acc parallel” directive:

! Jacobi 9-point stencil operation, simplest case
!
subroutine jacobi( a, newa, n, m, w0, w1, w2, tolerance, change, iters )
 real :: w0, w1, w2, tolerance
 integer :: n, m
 real, dimension(:,:) :: a, newa
 real, intent(out) :: change
 integer, intent(out) :: iters

 integer :: i,j

 change = tolerance + 1 ! get into the while loop

 iters = 0
 do while ( change > tolerance )
  iters = iters + 1
  change = 0
!$acc parallel
    do j = 2, n-1
      do i = 2, m-1
        newa(i,j) = w0 * a(i,j) + &
        w1 * (a(i-1,j) + a(i,j-1) + a(i+1,j) + a(i,j+1) ) + &
        w2 * (a(i-1,j-1) + a(i-1,j+1) + a(i+1,j-1) + a(i+1,j+1) )
        change = max( change, abs( newa(i,j) - a(i,j) ) )
      enddo
    enddo
    a(2:m-1,2:n-1) = newa(2:m-1,2:n-1)
!$acc end parallel
 enddo
end subroutine



pgfortran -acc -Minfo=all  -c J1.f90 -Minfo=accel
NOTE: your trial license will expire in 12 days, 6.32 hours.
NOTE: your trial license will expire in 12 days, 6.32 hours.
jacobi:
     18, Accelerator kernel generated
         19, CC 1.0 : 17 registers; 112 shared, 28 constant, 0 local memory bytes
             CC 2.0 : 22 registers; 0 shared, 132 constant, 0 local memory bytes
         20, !$acc loop vector(256) ! threadidx%x
         24, Max reduction generated for change
         27, !$acc loop vector(256) ! threadidx%x
     18, Generating copyout(newa(2:m-1,2:n-1))
         Generating copyin(a(:m,:n))
         Generating copyout(a(2:m-1,2:n-1))
         Generating compute capability 1.0 binary
         Generating compute capability 2.0 binary
     19, Loop is parallelizable
     20, Loop is parallelizable
     27, Loop is parallelizable
pgfortran -o J1.exe -acc -Minfo=all  Jmain.o J1.o

This version freezes every time I try to run it

Here is the same source code using “acc region” directive

! Jacobi 9-point stencil operation, simplest case
!
subroutine jacobi( a, newa, n, m, w0, w1, w2, tolerance, change, iters )
 real :: w0, w1, w2, tolerance
 integer :: n, m
 real, dimension(:,:) :: a, newa
 real, intent(out) :: change
 integer, intent(out) :: iters

 integer :: i,j

 change = tolerance + 1 ! get into the while loop

 iters = 0
 do while ( change > tolerance )
  iters = iters + 1
  change = 0
!$acc region
    do j = 2, n-1
      do i = 2, m-1
        newa(i,j) = w0 * a(i,j) + &
        w1 * (a(i-1,j) + a(i,j-1) + a(i+1,j) + a(i,j+1) ) + &
        w2 * (a(i-1,j-1) + a(i-1,j+1) + a(i+1,j-1) + a(i+1,j+1) )
        change = max( change, abs( newa(i,j) - a(i,j) ) )
      enddo
    enddo
    a(2:m-1,2:n-1) = newa(2:m-1,2:n-1)
!$acc end region
 enddo
end subroutine



pgfortran -acc -Minfo=all  -c J1.f90 -Minfo=accel
NOTE: your trial license will expire in 12 days, 6.27 hours.
NOTE: your trial license will expire in 12 days, 6.27 hours.
jacobi:
     18, Generating copyout(newa(2:m-1,2:n-1))
         Generating copyin(a(:m,:n))
         Generating copyout(a(2:m-1,2:n-1))
         Generating compute capability 1.0 binary
         Generating compute capability 2.0 binary
     19, Loop is parallelizable
     20, Loop is parallelizable
         Accelerator kernel generated
         19, !$acc do parallel, vector(16) ! blockidx%y threadidx%y
         20, !$acc do parallel, vector(16) ! blockidx%x threadidx%x
             CC 1.0 : 16 registers; 112 shared, 36 constant, 0 local memory bytes
             CC 2.0 : 22 registers; 16 shared, 120 constant, 0 local memory bytes
         24, Max reduction generated for change
     27, Loop is parallelizable
         Accelerator kernel generated
         27, !$acc do parallel, vector(16) ! blockidx%y threadidx%y
             !$acc do parallel, vector(16) ! blockidx%x threadidx%x
             CC 1.0 : 8 registers; 80 shared, 12 constant, 0 local memory bytes
             CC 2.0 : 10 registers; 16 shared, 96 constant, 0 local memory bytes
pgfortran -o J1.exe -acc -Minfo=all  Jmain.o J1.o

This version runs without problems

PGI compiler version 12.3 correctly compile the “acc parallel” directive.

Question: What’s wrong with the new version 12.5?

I forgot to post the main program of the Jacobi method

! main routine to call any of the accelerator model Jacobi routines
!
program main
 interface
  subroutine jacobi( a, newa, n, m, w0, w1, w2, tolerance, change, iters )
   real :: w0, w1, w2, tolerance
   integer :: n, m
   real, dimension(:,:) :: a, newa
   real, intent(out) :: change
   integer, intent(out) :: iters
  end subroutine
 end interface

 integer nargs
 integer n, m
 character*10 arg
 real, allocatable :: a(:,:), newa(:,:)
 real :: delta
 integer :: iters

 integer :: dt1(8), dt2(8), t1, t2
 real :: rt

 n = 400
 nargs = iargc()
 if( nargs == 0 )then
   print *, 'jacobi size1 [size2, defaults to size1]'
   return
 endif
 if( nargs >= 1 )then
  call getarg( 1, arg )
  read(arg,'(i)') n
 endif

 m = n
 if( nargs >= 2 )then
  call getarg( 2, arg )
  read(arg,'(i)') m
 endif

 allocate( a(m,n) )
 allocate( newa(m,n) )

 do j = 1,n
  do i = 1,m
   a(i,j) = 0
   newa(i,j) = 0
  enddo
 enddo

do i = 1, m
  a(i,n) = i
 enddo
 do j = 1, n
  a(m,j) = j
 enddo
 a(m,n) = m+n
 
 call date_and_time( values=dt1 )
 call jacobi( a, newa, n, m, .2, .1, .1, .1, delta, iters )
 call date_and_time( values=dt2 )
 t1 = dt1(8) + 1000*(dt1(7)+60*dt1(6)+60*(dt1(5)))
 t2 = dt2(8) + 1000*(dt2(7)+60*dt2(6)+60*(dt2(5)))
 write(*,10) delta, iters, n, m
10 format( 'reached delta=', f15.6, ' in ', i, ' iterations for ', i4, ' x ', i4, ' array' )
 rt = (t2 - t1)
 rt = rt / 1000.
 write(*,20) rt
20 format( 'time=', f15.6, ' seconds' )

end program

The 12.5 release is still considered an early access/beta release with regards to OpenACC functionality. The upcoming 12.6 release, which will be full OpenACC 1.0 compliant compiles and runs your example program without errors.