member of derived type in CUF kernels

Hi everyone,

I have got the following question (probably straightforward):
I would like to use a device allocatable array, defined as a member of a derived type, in a CUF kernel.
For example

!$cuf kernel do(3) <<<*,*>>> 
 do k=1,nz
 do j=1,ny
 do i=0,nx				
      u%f_(i,j,k) = u%f_(i,j,k) + r	
 end do
 end do
 end do

where u%f_ is the array defined in a module with attribute device. I get the following error:

====================================
PGF90-S-0155-Host array used in CUF kernel - u%ux_%f_(0:nx,1:ny,1:nz)

A workaround is to pass directly f_ into the subroutine and then use it in place of u%f_.

Is there a more elegant way?

Thank you!

Paolo

Hi Paolo,

Using derived types in CUDA Fortran can be a bit tricky since only the members of a type can have a “device” attribute. If the derived type object itself has a “device” attribute, then it couldn’t be accessed on the host and any allocations would have to be done on the device as well, which would be poor for performance. For example, if we did the following:

        type mytype
           real, allocatable, dimension(:,:,:),device :: f_
        end type mytype

        type(mytype),device :: u
        allocate(u%f_(0:nx,ny,nz))
        u%f_ = 2.0

The code would seg fault since “u” has a device address being dereferenced on the host when ever “u%f_” is used.

So while you could remove “device” from “u” and then pass in “f_” to a subroutine, the easier method is to use the “managed” attribute. “managed” uses CUDA Unified Memory where the same address is accessible on the host and device.

For example:

        program test
        use cudafor
        implicit none

        type mytype
           real, allocatable, dimension(:,:,:),managed :: f_
        end type mytype

        type(mytype),managed :: u
        real :: r
        integer :: nx,ny,nz
        integer :: i,j,k

        r=2.0
        nx=32
        ny=32
        nz=32
        allocate(u%f_(0:nx,ny,nz))
        u%f_=2.0

        !$cuf kernel do(3) <<<*,*>>>
        do k=1,nz
          do j=1,ny
            do i=0,nx
               u%f_(i,j,k) = u%f_(i,j,k) + r
            end do
          end do
        end do

        do i=0,nx
           print *, u%f_(i,1,1),u%f_(i,ny,nz)
        enddo

        deallocate(u%f_)

        end program test

Note, since module variables have global static storage and CUDA Unified Memory is only available for dynamic data, if “u” was declared in a module using “managed” wont work. Instead, you would need to declare “u” as an allocatable with the “managed” attribute and then when it’s allocated, it would use a unified address.

Hope this helps,
Mat

Hi Mat,

thanks a lot for the prompt reply. I will give it a try.

Best regards,
Paolo