What is the difference between those two codes?

Hello,

Below are two versions of what I think should compile to the same program. The first version works fine and move data with the enter/exit data directive. The second version segfault and use only the copy pragma on the same line than the parallel pragma.
I don’t understand why the code behaves differently, is it a bug in the compiler ?

Version 1, works fine:

      program main
      type t
        integer, allocatable :: inside(:)
      end type t
  
      type(t) :: my_t
      integer :: i
      allocate(my_t%inside(100))
!$acc enter data copyin(my_t, my_t%inside)
!$acc parallel loop present(my_t, my_t%inside)
      do i=1,100
        my_t%inside(i) = i
      end do
!$acc exit data copyout(my_t%inside, my_t)

      write(*,*)my_t%inside
      end program main

Version 2, segfault:

      program main
      type t
        integer, allocatable :: inside(:)
      end type t
  
      type(t) :: my_t
      integer :: i
      allocate(my_t%inside(100))
!$acc parallel loop copy(my_t, my_t%inside)
      do i=1,100
        my_t%inside(i) = i
      end do

      write(*,*)my_t%inside
      end program main

Hi!

For this code, it’s actually more a question of why the first one works since it should segv as well.

The problem is that you’re copying “my_t” back to the host. The device copy of “my_t” will contain a device pointer to “inside”. When copying back to the host, you will overwrite the host address of “inside” with the device address which then causes the segv when dereferencing it in the write statement.

I would suggest something like the following:

% cat test.f90
      program main
      type t
        integer, allocatable :: inside(:)
      end type t

      type(t) :: my_t
      integer :: i
      allocate(my_t%inside(100))
!$acc parallel loop copyin(my_t) copy(my_t%inside)
      do i=1,100
        my_t%inside(i) = i
      end do

      write(*,*)my_t%inside
      end program main
% cat test2.f90
      program main
      type t
        integer, allocatable :: inside(:)
      end type t

      type(t) :: my_t
      integer :: i
      allocate(my_t%inside(100))
!$acc enter data copyin(my_t, my_t%inside)
!$acc parallel loop present(my_t, my_t%inside)
      do i=1,100
        my_t%inside(i) = i
      end do
!$acc exit data copyout(my_t%inside) delete(my_t)

      write(*,*)my_t%inside
      end program main
% pgfortran -ta=tesla -Minfo=accel test.f90 ; a.out
main:
      9, Generating copy(my_t%inside(:)) [if not already present]
         Generating copyin(my_t) [if not already present]
         Generating Tesla code
         10, !$acc loop gang, vector(100) ! blockidx%x threadidx%x
            1            2            3            4            5            6
            7            8            9           10           11           12
           13           14           15           16           17           18
           19           20           21           22           23           24
           25           26           27           28           29           30
           31           32           33           34           35           36
           37           38           39           40           41           42
           43           44           45           46           47           48
           49           50           51           52           53           54
           55           56           57           58           59           60
           61           62           63           64           65           66
           67           68           69           70           71           72
           73           74           75           76           77           78
           79           80           81           82           83           84
           85           86           87           88           89           90
           91           92           93           94           95           96
           97           98           99          100
% pgfortran -ta=tesla -Minfo=accel test2.f90 ; a.out
main:
      9, Generating enter data copyin(my_t%inside(:),my_t)
     10, Generating present(my_t,my_t%inside(:))
         Generating Tesla code
         11, !$acc loop gang, vector(100) ! blockidx%x threadidx%x
     14, Generating exit data copyout(my_t%inside(:))
         Generating exit data delete(my_t)
            1            2            3            4            5            6
            7            8            9           10           11           12
           13           14           15           16           17           18
           19           20           21           22           23           24
           25           26           27           28           29           30
           31           32           33           34           35           36
           37           38           39           40           41           42
           43           44           45           46           47           48
           49           50           51           52           53           54
           55           56           57           58           59           60
           61           62           63           64           65           66
           67           68           69           70           71           72
           73           74           75           76           77           78
           79           80           81           82           83           84
           85           86           87           88           89           90
           91           92           93           94           95           96
           97           98           99          100

Hope this helps,
Mat

Hi,

Thanks for your answer and the code snippets.

I wrote the first example by adapting the code found in the slide 27 of https://on-demand.gputechconf.com/gtc/2018/presentation/s8750-porting-vasp-to-gpus-with-openacc.pdf
The difference between my example and the one in the slide is that the datatype used was not only made of an allocatable array, I guess that is the reason they had to add the copyout of the type instance. But still it seems they are saying that the pointer to the allocatable array is not modified when doing the copyout. I guess I’m not understanding that example correctly.

ah, Yes, my fault. That explains why the first example works. Data management, especially deep copy, can be confusing at times, even for me.

Hopefully not to confuse things further, VASP is one of the main motivating codes for “full” deep-copy, as opposed to “manual” deep-copy, which is what your example #1 is doing. With “full”, you only need to add the type variable in a copy clause and the entire structure will be copied making programming much simpler. The caveats being that for large and deep structures, it may cause extra overhead which can slow down the code and you may not be using the entire structure on the device. For more fine-grain control, you’d want to revert back to using “manual”.

For example, in your second example we can remove “my_t%inside”, while adding the compiler flag “-ta=tesla:deepcopy”.

% cat test2.f90
      program main
      type t
        integer, allocatable :: inside(:)
      end type t

      type(t) :: my_t
      integer :: i
      allocate(my_t%inside(100))
!$acc parallel loop copy(my_t)
      do i=1,100
        my_t%inside(i) = i
      end do

      write(*,*)my_t%inside
      end program main
sky4:/local/home/colgrove% pgfortran -ta=tesla:deepcopy test2.f90; a.out
^Ccleaning up after signal(2)...
cleaning up after signal(2)...
^C
% pgfortran -ta=tesla:deepcopy -Minfo=accel test2.f90 ; a.out
main:
      9, Generating copy(my_t) [if not already present]
         Generating Tesla code
         10, !$acc loop gang, vector(100) ! blockidx%x threadidx%x
            1            2            3            4            5            6
            7            8            9           10           11           12
           13           14           15           16           17           18
           19           20           21           22           23           24
           25           26           27           28           29           30
           31           32           33           34           35           36
           37           38           39           40           41           42
           43           44           45           46           47           48
           49           50           51           52           53           54
           55           56           57           58           59           60
           61           62           63           64           65           66
           67           68           69           70           71           72
           73           74           75           76           77           78
           79           80           81           82           83           84
           85           86           87           88           89           90
           91           92           93           94           95           96
           97           98           99          100

-Mat