In my code I have some sections where, on the older CPU code, I have some derived types which contain arrays of allocatable data. I have identified some sections of the code where I believe this data can be declared as device only. What is the correct way to work with this kind of type? For example, a program like this
module test
implicit none
type :: array_real_1D_d
real, dimension(:), device, allocatable :: data
end type array_real_1D_d
type :: array_real_1D_h
real, dimension(:), allocatable :: data
end type array_real_1D_h
contains
subroutine sum_face()
implicit none
integer, parameter :: a = 2
integer, parameter :: b = 2
type(array_real_1D_d), device, allocatable :: weight(:, :)
type(array_real_1D_d), device, allocatable :: face_d(:)
type(array_real_1D_h), allocatable :: face_h(:)
real :: weight_sum
integer :: k, l, i, j, n
allocate (weight(a, a))
do i = 1, a
do j = 1, a
allocate (weight(i, j)%data(b))
do l = 1, b
weight(i, j)%data(l) = 1.0
end do
end do
end do
allocate (face_h(a))
allocate (face_d(a))
do i = 1, a
allocate (face_h(i)%data(b))
allocate (face_d(i)%data(b))
do n = 1, b
face_d(i)%data(n) = 0.0
end do
end do
do concurrent(k=1:a, i=1:b)
weight_sum = 0.0
do concurrent(l=1:a) reduce(+:weight_sum)
weight_sum = weight_sum + weight(k, l)%data(i)
end do
face_d(k)%data(i) = face_d(k)%data(i) + weight_sum
end do
print *, "Face sum completed."
print *, "Face array:"
do i = 1, a
do j = 1, a
face_h(i)%data(:) = face_d(i)%data(:)
print *, face_h(i)%data
end do
end do
deallocate (weight)
deallocate (face_d)
deallocate (face_h)
end subroutine sum_face
end module test
program main
use test
implicit none
call sum_face()
end program main
Will generate some compiler errors when compiled with
nvfortran -cuda -stdpar=gpu -O0 -g test_derived.f90 -o test_derived
NVFORTRAN-S-0519-More than one reference to a device-resident object in assignment (test_derived.f90: 32)
NVFORTRAN-S-0519-More than one reference to a device-resident object in assignment (test_derived.f90: 42)
NVFORTRAN-S-0519-More than one reference to a device-resident object in assignment (test_derived.f90: 58)
0 inform, 0 warnings, 3 severes, 0 fatal for sum_face
But is the definition of a separate device and host derived types even a good approach for this kind of problem?
Furthermore, I would like to use cuf
directives to avoid polluting the code with #ifdef
but replacing the device definitions with
...
type :: array_real_1D_d
real, dimension(:), allocatable :: data
!@cuf attribute(device) :: data
end type array_real_1D_d
...
type(array_real_1D_d), allocatable :: weight(:, :)
!@cuf attribute(device) :: weight
type(array_real_1D_d), allocatable :: face_d(:)
!@cuf attribute(device) :: face_d
...
Leads me to the following error:
NVFORTRAN-S-0310-Illegal statement in the specification part of a MODULE (test_derived.f90: 6)
0 inform, 0 warnings, 1 severes, 0 fatal for test
NVFORTRAN-S-0034-Syntax error at or near :: (test_derived.f90: 22)
NVFORTRAN-S-0034-Syntax error at or near :: (test_derived.f90: 24)
0 inform, 0 warnings, 2 severes, 0 fatal for sum_face
In general, how would I go about working with this kind of data? I am not sure flattening the arrays would be an option due to the large scale of the problem.