Hi, the following code is an example of the problem I am facing now. I want to get reduction variables, nmax and ierr, from a GPU parallelized region. One of the variables, ierr, is for checking an error inside the kernel region that takes 1 if i.gt.ndim which should not happen in this case and takes 0 otherwise.
program test
implicit none
integer,parameter:: ndim = 10
integer:: i,ierr,nmax
real(8):: arr(ndim)
nmax= 0
ierr = 0
!$acc data copy(nmax,ierr)
!$acc kernels
!$acc loop reduction(max:nmax,ierr)
do i=1,ndim
if( i.gt.ndim ) ierr = 1
print *,'i,ierr,nmax=',i,ierr,nmax
if( ierr.gt.0 ) cycle
nmax = max(nmax,i)
enddo
!$acc end kernels
!$acc end data
print *,'nmax=',nmax
end program test
I expected the nmax value at the end should be 10, but it returns 0. And the ierr and nmax values inside the loop are something wrong as follows.
i,ierr,nmax= 1 -2147483648 -2147483648
i,ierr,nmax= 2 -2147483648 -2147483648
i,ierr,nmax= 3 -2147483648 -2147483648
i,ierr,nmax= 4 -2147483648 -2147483648
i,ierr,nmax= 5 -2147483648 -2147483648
i,ierr,nmax= 6 -2147483648 -2147483648
i,ierr,nmax= 7 -2147483648 -2147483648
i,ierr,nmax= 8 -2147483648 -2147483648
i,ierr,nmax= 9 -2147483648 -2147483648
i,ierr,nmax= 10 -2147483648 -2147483648
nmax= 0
What is wrong with the above code?
The compilation message:
$ nvfortran -acc -Minfo=accel test.F90
test:
10, Generating copy(ierr,nmax) [if not already present]
13, Loop is parallelizable
Generating Tesla code
13, !$acc loop gang, vector(32) ! blockidx%x threadidx%x
Generating reduction(max:nmax,ierr)
- Cent OS 7
- Quadro RTX 5000
- nvfortran 20.7-0 LLVM 64-bit target on x86-64 Linux -tp skylake
- Driver Version: 450.57 CUDA Version: 11.0
Thanks in advance.