Dear all,
during the development of a Fortran library for easy handling of memory offloading on GPU devices, we faced an issue concerning the OpenACC directive deviceptr
. Please, consider the following minimal test:
program test_deviceptr
use iso_c_binding
use openacc
implicit none
integer :: sizes(3)=[1,2,3]
real, pointer :: a(:,:,:)
real, allocatable, target :: b(:,:,:)
type(c_ptr) :: cptr
integer(c_size_t) :: bytes
integer :: i, j, k
interface
function acc_malloc_f(total_byte_dim) bind(c, name="acc_malloc")
use iso_c_binding, only : c_ptr, c_size_t
implicit none
type(c_ptr) :: acc_malloc_f
integer(c_size_t), value, intent(in) :: total_byte_dim
endfunction acc_malloc_f
subroutine acc_memcpy_from_device_f(host_ptr, dev_ptr, total_byte_dim) bind(c, name="acc_memcpy_from_device")
use iso_c_binding, only : c_ptr, c_size_t
implicit none
type(c_ptr), value :: host_ptr
type(c_ptr), value :: dev_ptr
integer(c_size_t), value :: total_byte_dim
endsubroutine acc_memcpy_from_device_f
endinterface
bytes = int(storage_size(a)/8, c_size_t) * int(product(sizes), c_size_t)
cptr = acc_malloc_f(bytes)
if (c_associated(cptr)) call c_f_pointer(cptr, a, shape=sizes)
!$acc parallel loop collapse(3) deviceptr(a)
do k=1, sizes(3)
do j=1, sizes(2)
do i=1, sizes(1)
a(i,j,k) = (i + j + k) * 0.5
enddo
enddo
enddo
allocate(b(sizes(1),sizes(2),sizes(3)))
call acc_memcpy_from_device_f(c_loc(b), c_loc(a), bytes)
do k=1, sizes(3)
do j=1, sizes(2)
do i=1, sizes(1)
print*, b(i,j,k)
enddo
enddo
enddo
endprogram test_deviceptr
The test allocates an array on the device, fills it through a parallel-loop, copies it back on the host, and prints the result. If I compile this test with nvfortran (24.1-0) it compiles and runs correctly (as expected):
1.50000000
2.00000000
2.00000000
2.50000000
2.50000000
3.00000000
However, if I use GNU gfortran (13.1.0) I obtain:
compilers_proofs/oac/test_deviceptr.f90:34:42:
34 | !$acc parallel loop collapse(3) deviceptr(a)
| 1
Error: POINTER object ‘a’ in MAP clause at (1)
As a consequence, I have read more carefully the OpenACC latest specs and found the following statements:
deviceptr
The deviceptr clause may appear on structured data and compute constructs and declare directives.
The deviceptr clause is used to declare that the pointers in var-list are device pointers, so the
data need not be allocated or moved between the host and device for this pointer.In C and C++, the vars in var-list must be pointer variables.
In Fortran, the vars in var-list must be dummy arguments (arrays or scalars), and may not have the
Fortran pointer, allocatable, or value attributes.
For data in shared memory, host pointers are the same as device pointers, so this clause has no effect.
To my understanding, in Fortran, deviceptr
should not accept pointer variables (as I do in the test), thus it seems that GNU gfortran is right in raising the error. Furthermore, I tried to use the present
directive that looks to accept pointer variables:
present
The present clause may appear on structured data and compute constructs and declare directives. The present clause specifies that vars in var-list are in shared memory or are already present in the current device memory due to data regions or data lifetimes that contain the construct on which the present clause appears.
For each var in var-list, if var is in shared memory, no action is taken; if var is not in shared memory,
the present clause behaves as follows:
• At entry to the region:
– An attach action is performed if var is a pointer reference, and a present increment
action with the structured reference counter is performed if var is not a null pointer.
• At exit from the region:
– If the structured reference counter for var is zero, no action is taken.
– Otherwise, a detach action is performed if var is a pointer reference, and a present decrement
action with the structured reference counter is performed if var is not a null pointer. If
both structured and dynamic reference counters are zero, a delete action is performed.
Substituting the directive deviceptr
with present
in the test makes GNU gfortran to compile and run correctly the test, but with nvfortran, altough the test is correctly complied, the runnning results in the following error:
hostptr=0x79999d2fa000,stride=1,size=6,eltsize=4,name=a(:,:,:),flags=0x200=present,async=-1,threadid=1
Present table dump for device[1]: NVIDIA Tesla GPU 0, compute capability 7.5, threadid=1
Hint: specify 0x800 bit in NV_ACC_DEBUG for verbose info.
...empty...
allocated block device:0x79999d2fa000 size:512 thread:1
FATAL ERROR: data in PRESENT clause was not found on device 1: name=a(:,:,:) host:0x79999d2fa000
file:/home/stefano/fortran/FUNDAL/compilers_proofs/oac/test_present.f90 test_present line:34
Currently, we are using deviceptr
directive in our library because it works fine with nvfortran, but we are worried this is not the right way due to OpenACC specs and GNU gfortran behavior.
Can you explain if we are out of specs using deviceptr
in the above way?
Kind regards,
Stefano