I’m working on a small test case to reproduce an error on a very large fortran code (> 1000 files) and I met some troubles understanding the basic data management with OpenMP5 directives. The code is in attachment.
array_m.F90 (18.3 KB)
Makefile (498 Bytes)
The first module array_m is managing a user defined type (r1_t) with a 1D array of double precision as attribute val. The goal is to manage data at this level creating a copy of the array on the device as it is allocated (line 161)
The second module velocity_m manages a user defined type vel2_t using 2 attributes of type r1_t. So the device storage of the arrays is managed by the r1_t type in module array_m.
The last module calcule_m has only one subroutine calculating an average in an offloaded loop. Interesting lines are:
521 A=>AX%vel_new%val 522 B=>BX%vel_new%val 523 C=>CX%vel_new%val 524 !$omp target update to(B,C) 525 !$omp target 526 !$omp parallel do private(i,j) 527 do i=1, fin 528 do j=1, 10 529 A(i)=0.5*(B(i)+C(i)) 530 enddo 531 enddo 532 !$omp end target 533 !$omp target update from(A)
I do not understand why at run time, there is an update of B and C (line 524) and then B,C and A are also pushed on the device line 525 ?
Accelerator Kernel Timing data /HA/sources/begou/SUB_ARRAY_OFFLOAD/TESTCASE_1/array_m.F90 add_new_r1 NVIDIA devicenum=0 time(us): 74 161: data region reached 6 times 161: data copyin transfers: 12 device time(us): total=74 max=12 min=5 avg=6 /HA/sources/begou/SUB_ARRAY_OFFLOAD/TESTCASE_1/array_m.F90 moy_r1 NVIDIA devicenum=0 time(us): 1,223 524: update directive reached 1 time 524: data copyin transfers: 2 device time(us): total=888 max=449 min=439 avg=444 525: data region reached 2 times 525: data copyin transfers: 3 device time(us): total=18 max=7 min=5 avg=6 533: update directive reached 1 time 533: data copyout transfers: 1 device time(us): total=317 max=317 min=317 avg=317
Moreover if at compile time nvfortran says:
nvfortran -c -o array_m.o -O1 -g -DY2_GPU -mp=gpu -gpu=cc80 -target=gpu -Minfo=accel array_m.F90 moy_r1: 524, Generating update to(b(:),c(:)) 525, Generating implicit map(tofrom:a(:),c(:),b(:)) 533, Generating update from(a(:))
The “tofrom” indicated by Minfo=accel at line 525 is not bringing back the A(:) array (at runtime only a copy out for the 3 arrays A(:), B(:) and C(:) is shown and the update directive line 533 is required).
This is of course a beginner question, but if someone could explain me this behavior or suggest a documentation, I have yet the OpenMP Version 5.1 documentation from November 2020 and the OpenMP Application Programming Interface Examples from June 2020 but it did not help me to undestand this.