Deep copy of nested data types

Hi,
I’ve put together a small test program with an example of nested data types that I would like to be deep-copied on the device:

module mod_eblk
  implicit none
  type eblk_t
    real, allocatable :: rhs(:)
  end type eblk_t
end module mod_eblk

module mod_grid
  use mod_eblk
  implicit none
  type grid_t
    type(eblk_t), allocatable :: eblk(:)
  end type grid_t
end module mod_grid

program testdeep
  use mod_grid
  implicit none
  type(grid_t), allocatable :: grid(:)
  integer                   :: i

  allocate(grid(1))
  allocate(grid(1)%eblk(9))
  do i = 1, 9
    allocate(grid(1)%eblk(i)%rhs(100))
    grid(1)%eblk(i)%rhs = i
  enddo

!$acc data copy( grid(:) )
!$acc data copy( grid%eblk(:) )
!$acc data copy( grid%eblk%rhs(:) )

!$acc end data
!$acc end data
!$acc end data

end program testdeep

This toy code will not compile, neither with nor without the deepcopy option:

> pgf90 -ta=tesla:cc70 -Minfo=all testdeep.F90
PGF90-F-0155-Compiler failed to translate accelerator region (see -Minfo messages): Expecting array datatype (testdeep.F90: 1)
PGF90/x86-64 Linux 19.10-0: compilation aborted
> pgf90 -ta=tesla:cc70,deepcopy -Minfo=all testdeep.F90
PGF90-F-0155-Compiler failed to translate accelerator region (see -Minfo messages): Expecting array datatype (testdeep.F90: 1)
PGF90/x86-64 Linux 19.10-0: compilation aborted

What does the error message mean? Is there a way to get this to compile? More generally, are the kinds of multi-level deep copies that I’m attempting allowed/supported?

The compiler I’m using is: pgf90 19.10-0 LLVM 64-bit target on x86-64 Linux -tp sandybridge

Thank you,

John

PS… I notice the caveat on the OpenACC Getting Started Guide:

Arrays of derived type, where the derived type contains allocatable members, have not
been tested and should not be considered supported for this release. That important
feature will be included in an upcoming release.

(OpenACC Getting Started Guide Version 19.10 for x86 and NVIDIA Processors)

This suggests that what I’m trying to do is not yet supported. Is there an estimate for when this feature will be supported?

Hi John,

Expecting array datatype

The problem here is that “grid” need is an array so you can’t use “!$acc data copy( grid%eblk(:) )” but instead need to use “!$acc data copy( grid(1)%eblk(:) )”

You then also need to loop through each of the 9 eblks to create the rhs arrays which doesn’t work well for structured data regions.

More generally, are the kinds of multi-level deep copies that I’m attempting allowed/supported?

Absolutely. There are several different options for you.

First, there’s a manual deep copy, which is basically what you’re doing now. Though, you’ll want to use unstructured data regions and build the device copy. I like to interleave these in the same spot where I’m building the structure on the host. Something like:

  allocate(grid(1))
!$acc enter data create(grid(:))
  allocate(grid(1)%eblk(9))
!$acc enter data create(grid(1)%eblk(:))
  do i = 1, 9
    allocate(grid(1)%eblk(i)%rhs(100))
    grid(1)%eblk(i)%rhs = i
!$acc enter data copyin(grid(1)%eblk(i)%rhs(:))
  enddo

!... use "update" directives to synchronize the data, but only with the contiguous blocks
  do i=1,9
 !$acc update host(grid(1)%eblk(i)%rhs
 end do
 
 !... Then delete the data when you deallocate the structure
  do i = 1, 9
!$acc exit data delete(grid(1)%eblk(i)%rhs)
   deallocate(grid(1)%eblk(i)%rhs)
  end do
!$acc exit data delete(grid(1)%eblk)
  deallocate(grid(1)%eblk)
!$acc exit data delete(grid)
  deallocate(grid)

Another option is to have the compiler perform the deep copy for you by using the flag “-ta=tesla:deepcopy”. In this case, you just need to use either a structured data region like “!$acc data copy(grid)” or unstructured “!$acc enter data copyin(grid) … !$acc exit data copyout(grid)”. You can also just use “!$acc update host(grid)” to update the entire structure.

The final option is “true” deep-copy. These are directives you can add to your types defining the shape of the allocatable arrays thus allowing the compiler to traverse the structure. Though this is more useful for C/C++ which don’t have Fortran array descriptors.

Full details can be found in the following articles:

Manual Deep Copy: Deep Copy Support in OpenACC | PGI
True Deep Copy: True OpenACC Deep Copy Beta | PGI

Hope this helps,
Mat

Hi John,

The Getting Started Guide hasn’t been updated in awhile that that line should be removed. We’ve been supported derived types which contain allocatable members for some time.

-Mat

Thanks for the quick and always helpful replies, Mat. I redid the toy program as you suggested, using the -ta=tesla:deepcopy and it works now.

I then went back to the original program from which the toy was created and tried the same approach. But I hit something unexpected:

PGF90-F-0155-Compiler failed to translate accelerator region (see -Minfo messages): Unexpected Data Type in Deep Copy (m4_mod_diffusion_driver.F90: 1)

Disregarding the reported line number, it looks like my toy missed something important. I’ll try to recreate this error and post another small program. But please let me know if that error message rings any bells in the meantime.

Thanks again and best wishes for the new year.

John

Hi Mat,

I’ve been trying to come up with a small program that reproduces the “Unexpected Data Type in Deep Copy” error, but with no more luck than I’ve had getting rid of the error in the large program. I can’t get the small program to fail with that error.

I have some evidence that the error in the large program has to do with back-pointers to parent classes in the class hierarchy. I can send you the small program as a tar file to give you an idea of what we’re doing (with the caveat that the small program compiles correctly).

From what I’ve seen, it looks like I’m hitting some limit on depth or complexity of the class hierarchy, such that OpenACC is able to manage a small simple example but fails on a full-up more complicated class hierarchy. Is that plausible?

I can make it go away in the large program by selectively pruning away sub-classes, but that’s not a good work-around. On the other hand, if there were an acc directive I could use to tell the compiler not to look further into a sub-class when deep copying, that would certainly be a workable solution. Does such a directive exist?

I’ve searched for “Unexpected Data Type in Deep Copy” and the only hit is here: [nemo] Using kernels and managed memory with ORCA1_SI3_G08 · Issue #353 · stfc/PSyclone · GitHub ; but the author doesn’t seem to have determined the exact cause or solution. I guess I’m back to my earlier question: what does this error message mean?

Thanks for your help,

Hi John,

I took a look at the program you sent, but nothing stands out. Our support for submodules is relatively new (first added in the 18.7 release) but should work fine with the Deep Copy option. Of course there could be bugs and there were a few submodule bugs just posted on the PGI Forum today but I don’t know if these are related to your issue.

I guess I’m back to my earlier question: what does this error message mean?

General error where the compiler encountered a data type that it doesn’t know how to handle. No idea why it’s failing here so if you can get a reproducer, that would be very helpful.

On the other hand, if there were an acc directive I could use to tell the compiler not to look further into a sub-class when deep copying, that would certainly be a workable solution. Does such a directive exist?

There’s not a compiler option. While easier, the one caveat to the “-ta=tesla:deepcopy” flag in that it will attempt to traverse the entire type.

You may instead fall back to using a manual deep copy or try out the experimental True Deep Copy directives (shape and policy) to have better control on what gets copied.

-Mat