I learned that PGI compiler has support for manual deep copying of derived datatypes. I wrote a simple test program to verify this. It compiles cleanly but gives me an error at run time. Where am I making a mistake ?
program Test
implicit none
type dt
integer :: n
real, dimension(:), allocatable :: xm
end type dt
type(dt) :: grid
integer :: i
grid%n = 10
allocate(grid%xm(grid%n))
!$acc enter data copyin(grid)
!$acc enter data pcreate(grid%xm)
!$acc kernels
do i = 1, grid%n
grid%xm(i) = i * i
enddo
!$acc end kernels
print*,grid%xm
end program Test
The errors are
call to cuStreamSynchronize returned error 700: Illegal address during kernel execution
call to cuMemFreeHost returned error 700: Illegal address during kernel execution
Compiler is PGf90 16.10
Update: Commenting out the line
!!$acc enter data pcreate(grid%xm)
gives the correct result. Does this mean that PGI already supports deep-copying in fortran and I don’t have to create pointers and allocatables on the gpu directly ?
I need to look at this a little more. It fails for me even when I comment out the line from your last comment. It does work with -ta=tesla:managed, so I think the kernel is generated correctly.
It works for me with a structured data region. Note, you have to put an update clause at the end to get the data out.
!$acc data copyin(grid, grid%xm)
!$acc kernels
do i = 1, 10
grid%xm(i) = i * i
enddo
!$acc end kernels
!$acc update host(grid%xm(1:10))
!$acc end data
The trick here is that the derived type has a pointer to the allocated area. That pointer gets filled with a device address as part of the “attach” process during copyin. You don’t want that device address in the derived type after the kernel is done. I don’t believe we handle that correctly if you use “copy” rather than “copyin”. Hence you need an update host clause to pull out just the data.
I’m still trying to understand why this doesn’t work with enter data. It looks like we try to pass to the kernel the derived type scalar as an input argument in that case, and the code generation ends up using a bad address.
A way to work-around that is to force the kernel to see that grid is present:
!$acc enter data copyin(grid, grid%xm)
!$acc kernels present(grid)
do i = 1, 10
grid%xm(i) = i * i
enddo
!$acc end kernels