Hi,
I’m working on a small test case to reproduce an error on a very large fortran code (> 1000 files) and I met some troubles understanding the basic data management with OpenMP5 directives. The code is in attachment.
array_m.F90 (18.3 KB)
Makefile (498 Bytes)
The first module array_m is managing a user defined type (r1_t) with a 1D array of double precision as attribute val. The goal is to manage data at this level creating a copy of the array on the device as it is allocated (line 161)
The second module velocity_m manages a user defined type vel2_t using 2 attributes of type r1_t. So the device storage of the arrays is managed by the r1_t type in module array_m.
The last module calcule_m has only one subroutine calculating an average in an offloaded loop. Interesting lines are:
521 A=>AX%vel_new%val
522 B=>BX%vel_new%val
523 C=>CX%vel_new%val
524 !$omp target update to(B,C)
525 !$omp target
526 !$omp parallel do private(i,j)
527 do i=1, fin
528 do j=1, 10
529 A(i)=0.5*(B(i)+C(i))
530 enddo
531 enddo
532 !$omp end target
533 !$omp target update from(A)
I do not understand why at run time, there is an update of B and C (line 524) and then B,C and A are also pushed on the device line 525 ?
Accelerator Kernel Timing data
/HA/sources/begou/SUB_ARRAY_OFFLOAD/TESTCASE_1/array_m.F90
add_new_r1 NVIDIA devicenum=0
time(us): 74
161: data region reached 6 times
161: data copyin transfers: 12
device time(us): total=74 max=12 min=5 avg=6
/HA/sources/begou/SUB_ARRAY_OFFLOAD/TESTCASE_1/array_m.F90
moy_r1 NVIDIA devicenum=0
time(us): 1,223
524: update directive reached 1 time
524: data copyin transfers: 2
device time(us): total=888 max=449 min=439 avg=444
525: data region reached 2 times
525: data copyin transfers: 3
device time(us): total=18 max=7 min=5 avg=6
533: update directive reached 1 time
533: data copyout transfers: 1
device time(us): total=317 max=317 min=317 avg=317
Moreover if at compile time nvfortran says:
nvfortran -c -o array_m.o -O1 -g -DY2_GPU -mp=gpu -gpu=cc80 -target=gpu -Minfo=accel array_m.F90
moy_r1:
524, Generating update to(b(:),c(:))
525, Generating implicit map(tofrom:a(:),c(:),b(:))
533, Generating update from(a(:))
The “tofrom” indicated by Minfo=accel at line 525 is not bringing back the A(:) array (at runtime only a copy out for the 3 arrays A(:), B(:) and C(:) is shown and the update directive line 533 is required).
This is of course a beginner question, but if someone could explain me this behavior or suggest a documentation, I have yet the OpenMP Version 5.1 documentation from November 2020 and the OpenMP Application Programming Interface Examples from June 2020 but it did not help me to undestand this.