Dear staff,
I would like to ask for clarification regarding the behavior of present clause, declare directive for data movement and implicit copies in data regions spanning different routines/modules.
I am working on a Fortran application offloaded to GPUs with OpenACC, compiled with nvfortran -acc -Minfo=accel
from the hpc-sdk/2022
suite and traced with nsys profile --trace=openacc
.
I noticed that the following actions reported at compile time,
“X, Generating implicit copy* [if not already present]”
and
“Y, Generating present*”
map, in the Nsight System timeline view, to OpenACC events of the data-movement kind. These events are labelled as enter/exit data but embed only a Wait event with no Enqueue Upload/Download ; moreover there is not a corresponding memory operation in the cuda event panel.
My guess is that such events are triggered by checks at runtime of the presence of the variable on the device.
If this is correct, there are two points that I do not understand:
(1) Is it possible to use the present
clause to declare that the variable is on the device and avoid a presence check?
(2) Is it possible to avoid this presence check, revealed by a implicit copyin [if not already present] in a subroutine (B, child) of a subroutine (A, parent), when the variable is copied to the device in the parent subroutine A (with the enter data
directive) and used on the device in the child subroutine B?
To clarify my question, I attach a minimal script reproducing the behaviour mentioned above. Here, I copied to the device c with declare create in the module, b with enter data in the parent subroutine and a with declare copyin in the parent subroutine. The present of b is checked in the second loop with the present clause.
variables.f90 (165 Bytes)
inplacesum.f90 (605 Bytes)
main.f90 (194 Bytes)
By compiling with nvfortran -acc -Minfo=accel variables.f90 inplacesum.f90 main.f90
, I get
variables.f90:
inplacesum.f90:
inplacesum:
8, Generating copyin(a(:,:)) [if not already present]
9, Generating enter data copyin(b(:,:))
11, Generating exit data delete(b(:,:))
implicit_copies:
19, Generating NVIDIA GPU code
20, !$acc loop gang, vector(128) collapse(2) ! blockidx%x threadidx%x
21, ! blockidx%x threadidx%x collapsed
19, Generating implicit copyin(b(:,:),a(:,:)) [if not already present]
25, Generating present(b(:,:))
26, Generating NVIDIA GPU code
27, !$acc loop gang, vector(128) collapse(2) ! blockidx%x threadidx%x
28, ! blockidx%x threadidx%x collapsed
26, Generating implicit copyin(a(:,:)) [if not already present]
main.f90:
By selecting the OpenACC events in the Events view, I see the following ones :
Name
Device Init : inplacesum.f90:8
Enter Data : inplacesum.f90:8
Enter Data : inplacesum.f90:9
*Enter Data : inplacesum.f90:19
Wait : inplacesum.f90:19
Compute Construct : inplacesum.f90:19
*Exit Data : inplacesum.f90:19
*Enter Data : inplacesum.f90:25
Wait : inplacesum.f90:25
*Enter Data : inplacesum.f90:26
Wait : inplacesum.f90:26
Compute Construct : inplacesum.f90:26
*Exit Data : inplacesum.f90:26
*Exit Data : inplacesum.f90:25
Exit Data : inplacesum.f90:11
Exit Data : inplacesum.f90:8
In the list above, I resolved and marked the “fake data movement” (presence checks?) triggered by implicit copyin and present.
I also noticed the presence of an implicit copyin [if not present] on a at line 26, involving a copied with declare, while no implicit copies are reported for c. Is the declare directive within a subroutine equivalent to an enter data directive, with an (implicit) exit data at the end of the subroutine?
Thank you for your help,
Laura