I have the following problem. I have a main subroutine, let us call it
main_function (for 3D BSplines). It takes as input several tensors.
This function contains only IF-conditions. If a condition is satisfied, other functions are called. Let us call these functions:
function_c which are parallelizable.
The structure is as follows
subroutine main_function(paras) if(1) then call function_a else if (2) call function_b else if (3) call function_c end if end subroutine main_function
subroutine function_a(paras) !$acc parallel loop present(....) do heavy parallel calcs end do output: eta end subroutine function_a subroutine function_b(paras) !$acc parallel loop present(....) do heavy parallel calcs end do output: eta end subroutine function_b subroutine function_c(paras) !$acc parallel loop present(....) do heavy parallel calcs end do output: eta end subroutine function_c
function_c have a B-spline tensor (
eta) as an output calculated on GPU. I don’t want to move this tensor to the host since it is not needed there. However, after calculating
eta on GPU using
main_function, an interpolation subroutine
interpolate3D is called to interpolate the function. The definition of
interpolate3D is something like
subroutine interpolate3D(eta, x, y, z, fAtxyz) !$acc routine seq interpolate ... end subroutine interpolate3D
To summarize the the pseudo-code is something like
call main_function(paras) !$acc parallel loop present(x, y, eta, fAtxyz) do i = 1, N call interpolate3D(eta, x(i), y(i), z(i), fAtxyz(i)) end do
My problems and questions are:
1)- When I don’t use ‘
!$acc update self (eta)’ before the loop, the results are completely wrong. Does this mean that ‘
present clause’ doesn’t find correctly
eta, calculated by
main_function, on GPU. Therefore, one needs to update the host, and then recopy it back to the GPU?
2)- How to ensure that
interpolate3D is working on GPU? For example, if I don’t have the above loop, does only adding ‘
!$acc routine seq’ ensure that it works on GPU and searches for different quantities there?
3)- In fact, when there is no loop, adding ‘
!$acc update self (eta)’ is required to have correct results. Does this mean that in this case the subroutine is executed on CPU?
3)- To summarize, If I have two subroutines: the first choses between different subroutines based on if-conditions to calculate a vector or tensor and keep it on GPU (I don’t want to update the host), while the second will use this vector to perform some calculations on GPU, how to do this correctly with
I attached a very simple example concerning the questions above. The subroutine
eta on GPU, while the subroutine
calcFunAtx interpolates at the position
eta (Nearest-neighbor interpolation). I would like to know if possible how to allow
calcFunAtx to work directly with GPU data? Moreover, comments, correction or/and advice concerning the implementations are very welcome
program.f90 (1.9 KB)
Sorry for being long and thank you very much for your help,