allocatable arrays inside device data structures

Ananth_Srid · July 21, 2017, 8:36am

Hello,
I’m trying to use allocatable arrays inside user-defined types, with the whole data structure residing on the GPU.

Here’s my module declaration:

!=============
! This module contains definitions for data structures and the data
! stored on the device
!=============

   module GPU_variables
   use cudafor

   type :: data_str_def

!=============
! single number quantities
!=============

      integer                       :: i, j 
      real(kind=8)                  :: a 

!=============
! Arrays
!=============

      real(kind=8),   allocatable   :: b(:)
      real(kind=8),   allocatable   :: c(:,:)
      real(kind=8),   allocatable   :: d(:,:,:)
      real(kind=8),   allocatable   :: e(:,:,:,:)

   end type data_str_def

!=============
! Actual data is here
!=============

   type(data_str_def), device, allocatable   :: data_str(:)

   contains

!=============
! subroutine to allocate memory
!=============

      subroutine allocate_mem(n1)
      implicit none 
      integer, intent(in)  :: n1 

      call deallocate_mem()

      write(*,*) 'works here'
      allocate(data_str(n1))

      write(*,*) 'what about allocating memory?'
      allocate(data_str(n1) % b(10))
      write(*,*) 'success!'

      return
      end subroutine allocate_mem

!=============
! subroutine to deallocate memory
!=============

      subroutine deallocate_mem()
      implicit none
      if(allocated(data_str)) deallocate(data_str)
      return 
      end subroutine deallocate_mem

   end module GPU_variables

Calling program is

!=============
! main program 
!=============

    program gpu_test
    use gpu_variables
    implicit none

!=============
! local variables
!=============

    integer             :: i, j, n

!=============
! allocate data
!=============

    n       = 2                 ! number of data structures

    call allocate_mem(n)

!=============
! dallocate device data structures and exit
!=============

    call deallocate_mem()
    end program

module file is called gpu_modules.F90
mainprogram file is called gpu_test.F90

compilation command is

pgfortran -Mcuda=cc5x *.F90

Terminal output is

$ ./a.out
works here
what about allocating memory?
Segmentation fault (core dumped)

The idea was to use GPU memory in modules so that subroutines have access to the data, and data structures are a nice way to organize variable-sized arrays.

Am I doing something obviously wrong? Please help!

MatColgrove · July 25, 2017, 9:30pm

Hi Ananth_Srid,

Unfortunately this isn’t going to work this way since you would need to access the device array in order to allocate the type’s arrays. This can’t be done from the host. You could try writing a kernel that allocates the arrays on the device, but the easiest thing to do is use CUDA Unified Memory (i.e. the “managed” attribute) so the same addresses can be accessed from either the host or device.

For example:

 % cat gpu_modules.cuf
!=============
 ! This module contains definitions for data structures and the data
 ! stored on the device
 !=============

    module GPU_variables
    use cudafor

    type :: data_str_def

 !=============
 ! single number quantities
 !=============

       integer                       :: i, j
       real(kind=8)                  :: a

 !=============
 ! Arrays
 !=============

       real(kind=8),   allocatable, managed ::  b(:)
       real(kind=8),   allocatable, managed :: c(:,:)
       real(kind=8),   allocatable, managed :: d(:,:,:)
       real(kind=8),   allocatable, managed :: e(:,:,:,:)

    end type data_str_def

 !=============
 ! Actual data is here
 !=============

    type(data_str_def), managed, allocatable   :: data_str(:)

    contains

 !=============
 ! subroutine to allocate memory
 !=============

       subroutine allocate_mem(n1)
       implicit none
       integer, intent(in)  :: n1

       call deallocate_mem()

       write(*,*) 'works here', n1
       allocate(data_str(n1))

       write(*,*) 'what about allocating memory?'
       allocate(data_str(n1) % b(10))
       write(*,*) 'success!'

       return
       end subroutine allocate_mem

 !=============
 ! subroutine to deallocate memory
 !=============

       subroutine deallocate_mem()
       implicit none
       if(allocated(data_str)) deallocate(data_str)
       return
       end subroutine deallocate_mem

    end module GPU_variables


% pgfortran gpu_modules.cuf gpu_test.cuf -Mcuda=cc60 ; a.out
gpu_modules.cuf:
gpu_test.cuf:
 works here            2
 what about allocating memory?
 success!

Hope this helps,
Mat

Ananth_Srid · July 30, 2017, 12:04am

hi Mat,
thanks for the suggestion! I’ll try it out and report back here

cheers
Ananth

Ananth_Srid · August 6, 2017, 9:39am

hi Mat,
I want to have fine control over transfers for performance optimization and benchmarks, so …

I tried to implement your first suggestion - write a kernel to allocate the memory. However, when I try to allocate memory from within the kernel, the compiler throws an error: “unsupported procedure”

PGF90-F-0155-Compiler failed to translate accelerator region (see -Minfo messages): Unsupported procedure (gpu_modules.F90: 1)

!=====================================================================
! This module contains definitions for data structures and the data
! stored on the device
!=====================================================================

   module gpu_variables
   use cudafor

   type :: data_str_def

      real(kind=8),   allocatable   :: b(:)

   end type data_str_def

!=====================================================================
! Actual data is here
!=====================================================================

   type(data_str_def), device, allocatable   :: data_str(:)

!=====================================================================
! routines follow data
!=====================================================================

   contains

!=====================================================================
! kernel to allocate memory
!=====================================================================

      attributes(global) subroutine allocate_memory(r, bdim)
      implicit none 

      integer, value    :: r, bdim
      integer, device   :: i, j, k

      i     = threadIdx%x + (blockIdx%x - 1)*blockDim%x 
      j     = threadIdx%y + (blockIdx%y - 1)*blockDim%y 
      k     = threadIdx%z + (blockIdx%z - 1)*blockDim%z 
      
      if(i == 1 .and. j == 1 .and. k == 1) then
         allocate(data_str(r) % b(bdim))
      end if 

      end subroutine allocate_memory

!=====================================================================
! kernel to deallocate memory 
!=====================================================================

      subroutine deallocate_memory()
      implicit none 

      if(allocated(data_str)) deallocate(data_str)

      end subroutine deallocate_memory

   end module GPU_variables

and then call it using

!=============
! main program 
!=============

   program gpu_test
   use gpu_variables
   implicit none

!=============
! local variables
!=============

   integer          :: i, j, n
   type(dim3)       :: grid, block   

!=============
! allocate data on cpu first
!=============

    call deallocate_memory()
    n       = 2                 ! number of data structures
    allocate(data_str(n))

    grid    = dim3(1,1,1)
    block   = dim3(1,1,1)
   
    call allocate_memory<<<grid,block>>>(1, 10)

!=============
! dallocate data structures and exit
!=============

    call deallocate_memory()

    end program

Any help you could provide would be great.

MatColgrove · August 7, 2017, 10:10pm

Hi Ananth,

I was playing around with this and I don’t see any easy way to get this to work. You’ll need to use “managed” or make “b” a fixed size array (then you don’t need to allocate it on the device).

Sorry,
Mat

Ananth_Srid · August 10, 2017, 5:14am

hi Mat,
thanks for trying it out. I’ll use a workaround, and keep an eye on new developments.

Do you know if this issue is on PGI’s plans : short/long term?

Topic		Replies	Views
passing device allocatable array to kernel subprogram Legacy PGI Compilers	1	6201	February 19, 2010
How to use arrays of arrays in structures, in the device? Legacy PGI Compilers	8	7519	July 6, 2017
Fortran allocatable array creation&use only on gpu Legacy PGI Compilers	1	1494	March 1, 2019
Allocate derived type array on device Legacy PGI Compilers	4	3342	August 14, 2012
What is the correct way of working with arrays of derived types with allocatable components? nvc, nvc++ and nvfortran	7	149	June 27, 2025
CUDA Fortran - Dynamic Allocation Structure Legacy PGI Compilers cuda	1	529	October 7, 2020
Multi-dimension array allocation problem Legacy PGI Compilers	2	2392	November 30, 2017
Clone device object with allocatable proprierties Legacy PGI Compilers	3	3156	September 15, 2013
dynamically allocate an array of structure in cuda fortran? Legacy PGI Compilers	1	1747	February 21, 2012
user defined type allocatable array not supported? Legacy PGI Compilers	1	2566	April 1, 2011

allocatable arrays inside device data structures

Related topics