Hello everyone,
I have the following problem. I have a main subroutine, let us call it main_function
(for 3D BSplines). It takes as input several tensors.
This function contains only IF-conditions. If a condition is satisfied, other functions are called. Let us call these functions: function_a
, function_b
, and function_c
which are parallelizable.
The structure is as follows
subroutine main_function(paras)
if(1) then
call function_a
else if (2)
call function_b
else if (3)
call function_c
end if
end subroutine main_function
with
subroutine function_a(paras)
!$acc parallel loop present(....)
do
heavy parallel calcs
end do
output: eta
end subroutine function_a
subroutine function_b(paras)
!$acc parallel loop present(....)
do
heavy parallel calcs
end do
output: eta
end subroutine function_b
subroutine function_c(paras)
!$acc parallel loop present(....)
do
heavy parallel calcs
end do
output: eta
end subroutine function_c
The subroutines function_a
, function_b
, and function_c
have a B-spline tensor (eta
) as an output calculated on GPU. I donât want to move this tensor to the host since it is not needed there. However, after calculating eta
on GPU using main_function
, an interpolation subroutine interpolate3D
is called to interpolate the function. The definition of interpolate3D
is something like
subroutine interpolate3D(eta, x, y, z, fAtxyz)
!$acc routine seq
interpolate ...
end subroutine interpolate3D
To summarize the the pseudo-code is something like
call main_function(paras)
!$acc parallel loop present(x, y, eta, fAtxyz)
do i = 1, N
call interpolate3D(eta, x(i), y(i), z(i), fAtxyz(i))
end do
My problems and questions are:
1)- When I donât use â!$acc update self (eta)
â before the loop, the results are completely wrong. Does this mean that âpresent clause
â doesnât find correctly eta
, calculated by main_function
, on GPU. Therefore, one needs to update the host, and then recopy it back to the GPU?
2)- How to ensure that interpolate3D
is working on GPU? For example, if I donât have the above loop, does only adding â!$acc routine seq
â ensure that it works on GPU and searches for different quantities there?
3)- In fact, when there is no loop, adding â!$acc update self (eta)
â is required to have correct results. Does this mean that in this case the subroutine is executed on CPU?
3)- To summarize, If I have two subroutines: the first choses between different subroutines based on if-conditions to calculate a vector or tensor and keep it on GPU (I donât want to update the host), while the second will use this vector to perform some calculations on GPU, how to do this correctly with openACC
?
I attached a very simple example concerning the questions above. The subroutine calc_etaVec
calculates eta
on GPU, while the subroutine calcFunAtx
interpolates at the position xp
using eta
(Nearest-neighbor interpolation). I would like to know if possible how to allow calcFunAtx
to work directly with GPU data? Moreover, comments, correction or/and advice concerning the implementations are very welcome
program.f90 (1.9 KB)
Sorry for being long and thank you very much for your help,