Array of structures vs structure of arrays

Hi,

Sorry if this is a repeat of a previous post, I recall a discussion along these lines but couldn’t find the post.

The package I work on has recently undergone a significant rewrite in order to use derived types and I am merging those changes with the GPU implementation of the code.

Unfortunately I have had to hack my way around a host based “array of structures” by copying to a temporary array*. Other situations, where the derived type is used directly, are copied without issue - it is just this array of structures that is a problem.

I was hoping you could give me some brief feedback/links to further information regarding expected issues and best practises for derived types and OpenACC/CUDA Fortran.

Cheers,

Karl


*Just to clarify, only a subarray of the derived type is passed to the GPU, not the entire structure. However, this is then passed to the routine that does the host>device copy as follows:

subroutine routineA
   call routineB(derivedtype%array)
end subroutine routineA

subroutine routineb(array)
   istat = cudaMemcopyAsync(array, etc...)
end subroutine routineb

Debugging the code that uses derived types in this manner results in a memcopy error that suggests an issue with the host array as the memory location is 0x0.

Hi Karl,

There really isn’t a best practices guide for derived types as of yet. Greg Ruetsch has a section on it in the second edition of his CUDA Fortran book, but that still being written so not available publicly.

For OpenACC, if your derived type contains dynamic data members (which I assume is the case here), then the standard isn’t quite ready. It’s one of the major items for the next OpenACC standard, but for now it’s a bit piece meal depending upon the compiler you’re using.

For both cases, you might want to try using CUDA Unified Memory. It’s only available for dynamic memory and you’re limited on the amount of memory your program can use, but works well in these types cases and greatly simplifies your programming effort.

CUDA Fortran: https://www.pgroup.com/lit/articles/insider/v6n1a2.htm
OpenACC: https://www.pgroup.com/lit/articles/insider/v6n2a4.htm

For your example CUDA Fortran code, make sure “array” has the “device” attribute and that “routineb” includes an interface with the “array” argument having “device” as well.

  • Mat