Hi, I would like to seek some help to solve my problem in compiling an OpenACC directed program. The program is simplified and shown below:
module math
contains
SUBROUTINE pnm (A,N,IWK,WK)
!$acc routine seq
IMPLICIT NONE
INTEGER4 N,IWK(6N+150)
DOUBLE PRECISION WK(6N+150)
COMPLEX16 A(N)
INTEGER*4 K,ILL,JJ,K0,KB,JK,I,J,ITA,ITB
15 K = -IWK(ILL+K)
JJ = K0
K0 = JK * K + KB
I = 0
IF (K .NE. J) GO TO 15
55 A(K0+I+1) = DCMPLX(WK(ITA+I),WK(ITB+I))
I = I + 1
IF (I .LT. JK) GO TO 55
END
end module math
program main
use math
implicit none
integer(kind=4)::nmax,n_north,n_east,i,j,k,m,num_grid,prd
integer,allocatable:: iwk(:)
real(kind=8),allocatable::grav(:,:),wk(:)
real(kind=8):: error
character(len=40) filedgcombine
complex*16,allocatable:: pnmdata_cpx(:)
write(*,*) "please input n_east,n_north,num_grid and nmax"
read(*,*) n_east,n_north,num_grid,nmax
allocate(grav(num_grid,4))
allocate(pnmdata_cpx(n_east))
allocate(iwk(6*n_east+150))
allocate(wk(6*n_east+150))
write(*,*) "please input the gravity anomaly file name and error "
read(*,*) filedgcombine,error
open(unit=10,file=filedgcombine)
do i=1,num_grid
read(10,*) grav(i,1),grav(i,2),grav(i,3),grav(i,4)
end do
close(10)
!$acc kernels
!$acc loop independent private(pnmdata_cpx)
do i=1, n_north
pnmdata_cpx=dcmplx(0.D0,0.D0)
!$acc loop independent
do j=1, n_east
pnmdata_cpx(j)=dcmplx(grav((i-1)*n_east+j,4))
end do
call pnm(pnmdata_cpx,n_east,iwk,wk)
end do ! loop i
!$acc end kernels
end ! the main program
If I compile it with the following command:
mpif90 -acc -gpu=cc70 -gpu=cuda11.0 -Minfo=accel example.f90 -o example
The error information are shown below:
pnm:
0, Accelerator region ignored
16, Accelerator restriction: invalid loop
0 inform, 0 warnings, 1 severes, 0 fatal for pnm
main:
51, Generating implicit copy(wk(:),n_east) [if not already present]
Generating implicit copyin(grav(:,4)) [if not already present]
Generating implicit copy(iwk(:)) [if not already present]
53, Loop is parallelizable
Generating Tesla code
53, !$acc loop gang ! blockidx%x
54, !$acc loop vector(128) ! threadidx%x
56, !$acc loop vector(128) ! threadidx%x
54, Loop is parallelizable
56, Loop is parallelizable
It says the loop 15 in the subroutine is invalid loop. There are two loop in the subroutine. The other loop is loop 55. However, if I delete or comment the loop 55, which means the program is changed, the program could be successfully compiled.
I could not find some problems in the loop 15. Moreover, the program could be successfully compiled to a cpu-version program with the command “gfortran”
Could someone tell why the loop could be compiled to GPU program?
Many thanks!