Hi,
I would like to use GPU to realize the parallel computing.
I use the hpc_sdk 22.1 as a compiler and gcc -7.3.0 with offload feature.
But I failed to use the distribute parallel teams clause .
The compiler message is as follows.
nvvmCompileProgram error 9: NVVM_ERROR_COMPILATION.
Error: /tmp/pgaccemNOmU1qKkHv.gpu (5284, 33): parse use of undefined value '@nvkernel_loglik_q_F1L711_1_F1L713_2'
ptxas /tmp/pgaccemNOmD-xKx_A.ptx, line 1; fatal : Missing .version directive at start of file '/tmp/pgaccemNOmD-xKx_A.ptx'
ptxas fatal : Ptx assembly aborted due to errors
NVC++-W-0155-Compiler failed to translate accelerator region (see -Minfo messages): Device compiler exited with error status code (inverseGaussian.c: 1)
NVC++/x86-64 Linux 22.1-0: compilation completed with warnings
However the executable file is still produced, but it resulted in the core dumped.
The compilation order is following
nvc -c inverseGaussian.c LowDiscrepancy.o -mp=gpu -lgomp -lm -Minfo=all -Mcuda -lgf90 -gpu=cuda11.5 -loffload -O3 -acc=gpu -target=gpu -g
Part of the code is
--- skip ---
#pragma omp target data map(from:Y1[0:n_all],t_mat[0:n_all], method[0:3], \
mi_vec[0:n], n1[0:1], n2[0:1], seed_set[0:n], pars1[0:n_p],q[0:1],n[0:1]) \
map(to:fval[0:1],ans_mat[0:(3*n)])
#pragma omp target teams num_teams(9)
{
#pragma omp distribute parallel for simd
for(i=0;i<n;i++){
printf("[DEBUG] %d omp_get_num_teams()=%d\n",i,omp_get_num_teams());
int mi=mi_vec[i];
int iseed;
double y[m];
double dt_y[m];
double ans[3];
iseed=(int)(seed_set[i] * (double) MAX_MOD);
ytassign( y, dt_y, m, n, mi, i, Y1, t_mat);
qmc_int(i,iseed, n1, n2, mi, n_p, q, y, dt_y, pars1, ans, method);
fval+=ans[0];
} // for
}
--- skip ---
By the way the “qmc_int” will call more functions, and I declare them all into target .
The program is work if the teams and distribute pragmas are commented out.
The speed is much faster than one-core but 2 times slower than multicore.
I think GPU and hpc-sdk compiler are really helpful.
Hope the distribute feature can improve the GPU computing ability.
I 've tried any passible combination of target, teams, distribute, and so on.
How to solve the problem?
Thank you very much.
Hsueh Fang