I am trying to put a big loop onto the GPU.
It is something like this
do tmp_index = 1, Ninterior_faces
!do loop
!another do loop
call acc_evaluate_interaction_flux(DIM, P1, Nvar,
mesh%normals(glb_face_index, :))
end do
mesh is a derived type. First I only parallelized the do loops. The function call has many more arguments but I narrowed down the error to when I pass the derived type.
Running this gives me the error
[gn012:11106] *** Process received signal ***
[gn012:11106] Signal: Segmentation fault (11)
[gn012:11106] Signal code: (128)
[gn012:11106] Failing at address: (nil)
[gn012:11106] [ 0] /lib64/libpthread.so.0[0x316ee0f710]
[gn012:11106] [ 1] /usr/local/pgi-2016/linux86-64/16.5/lib/libpgc.so(__c_mcopy8+0x10e)[0x7fffe7bf7d8e]
[gn012:11106] *** End of error message ***
cuda-gdb tells me
Program received signal SIGSEGV, Segmentation fault.
0x00007fffe7bf7d8e in __c_mcopy8 () from /usr/local/pgi-2016/linux86-64/16.5/lib/libpgc.so
Now if instead I do
normals = mesh%normals(glb_face_index, :)
call acc_evaluate_interaction_flux(DIM, P1, Nvar,
normals)
the error goes away. So is this a known shortcoming, or am I doing something wrong.
Also, there’s another problem, if I may ask it here.
The subroutine goes something like this.
subroutine acc_evaluate_interaction_flux(DIM, p1, nvar, &
nminus, pnminus, &
uminus, uplus, &
Fdminus, Fdplus, Fvminus, Fvplus, &
Gdminus, Gdplus, Gvminus, Gvplus, &
Hdminus, Hdplus, Hvminus, Hvplus, &
Fi, Gi, Hi, &
Fv, Gv, Hv, &
gamma, viscous_prefactor, face_type, &
ilambda, ibeta_viscous, itau, &
glb_face_index)
!$acc routine
use input_module, only: ldg_tau, ldg_beta
!declarations
lambda = HALF; if (present(ilambda)) lambda = ilambda
beta_viscous = ldg_beta; if (present(ibeta_viscous)) beta_viscous = ibeta_viscous
tau_penalty = ldg_tau; if (present(itau)) tau_penalty = itau
!more loops
Now, ldg_tau, ldg_beta are defined in another module. So it gave me acc create errors. So I went to the other module and did
real(c_double), save, public :: vdiff = ZERO
real(c_double), save, public :: ldg_beta = HALF
real(c_double), save, public :: ldg_tau = TENTH
!$acc declare create(ldg_tau, ldg_beta, vdiff)
Unfortunately as soon as I do this, the code immediately exits, giving me
call to cudaGetSymbolAddress returned error 13: Other