Hello,
I 'm not able to solve this problem : allocate and use fortran allocatable arrays only on gpu (so no pre-allocation on the host). This example mimics a more complicated application.
program vector
!
implicit none
integer :: n, i
real, allocatable, dimension (:) :: u,p
!
read (5,*) n
allocate (u(n) ) ! , p(n) )
do i = 1, n
u(i) = -1.0
end do
!
!$ACC ENTER DATA COPYIN(u(1:n) ) CREATE(p(1:n) )
!
!$ACC KERNELS LOOP PRESENT (p)
do i = 1, n
p(i) = 10.0
end do
!$ACC END KERNELS LOOP
!
!$ACC KERNELS LOOP PRESENT (u,p)
do i = 1, n
u(i) = u(i) + p(i)
end do
!$ACC END KERNELS LOOP
!
!$ACC EXIT DATA COPYOUT(u) DELETE(p)
!
write (6,*) (u(i),i=1,n)
!
stop
end program vector
I use PGI Fortran v. 19.1, The compilation sequence is
pgf90 -acc -O2 -g -Minfo forum.f -o vector.out
vector:
9, Memory set idiom, loop replaced by call to __c_mset4
13, Generating enter data create(p(1:n))
Generating enter data copyin(u(1:n))
15, Generating present(p(:))
16, Loop is parallelizable
Generating Tesla code
16, !$acc loop gang, vector(128) ! blockidx%x threadidx%x
16, Memory set idiom, loop replaced by call to __c_mset4
21, Generating present(u(:),p(:))
22, Loop is parallelizable
Generating Tesla code
22, !$acc loop gang, vector(128) ! blockidx%x threadidx%x
22, Generated vector simd code for the loop
27, Generating exit data copyout(u(:))
Generating exit data delete(p(:))
The error at runtime :
echo 10 | ./vector.out
Failing in Thread:1
call to cuStreamSynchronize returned error 700: Illegal address during kernel execution
Failing in Thread:1
call to cuMemFreeHost returned error 700: Illegal address during kernel execution
If I take into account the allocation of array p on the host (line 8), everything’s fine.
could someone explain to me what’s wrong in this example ?
Regards,
Guy.