Hi all,
how can I make this portion of my c code run on GPU using OpenACC?
Here is the code.
#pragma acc kernels loop
for (i=1; i<= n1; i++)
{
for (j=n[i][1]+1; j<= n[i][1]+n[i][2]-1; j++)
{
if (k1[j] != 0)
{
i1=k[j];
i0=0;
for (l=1; l<= 6; l++)
{
for (l1=1; l1<= 6; l1++)
{
i0++;
r[i][l] += -a[j][i0]*z[i1][l1];
}
}
}
}
for (j=n[i][1]+1; j<= n[i][1]+n[i][2]-1; j++)
{
if (k1[j] != 0)
{
i1=k[j];
for (l=1; l<= 6; l++)
{
i0=l;
for (l1=1; l1<= 6; l1++)
{
r[i1][l] += -a[j][i0]*z[i][l1];
i0 +=6;
}
}
}
}
i0=0;
for (j=1; j<= 6; j++)
{
s[j]=0;
for (l=1; l<= 6; l++)
{
i0++;
s[j] += v[i][i0]*r[i][l];
}
a1 = s[j];
e1 += fabs(a1-z[i][j]);
e2 += fabs(a1);
z[i][j] = a1;
}
}
When I try to compile it, I get these information from GPI.
PGC-S-0155-Compiler failed to translate accelerator region (see -Minfo messages)
: Could not find allocated-variable index for symbol (d:\pgi\dfgpu.c: 2886)
di20:
2887, Loop carried dependence due to exposed use of r[1:n1][1:6] prevents parallelization
Loop carried dependence of a->->,r->->,s,v->->,z->-> prevents parallelization
Complex loop carried dependence of a->->,r->->,s,v->->,z->-> prevents parallelization
Loop carried backward dependence of a->->,r->->,s,v->->,z->-> prevents vectorization
Accelerator restriction: scalar variable live-out from loop: a1,e1,e2,i,i0,i1,j,l,l1,r->->,s,z->->
Scalar last value needed after loop for e1 at line 2962
Scalar last value needed after loop for e2 at line 2962
2896, Loop carried dependence due to exposed use of r[i1+1][1:i1+6] prevents parallelization
Complex loop carried dependence of a->->,r->->,z->-> prevents parallelization
Accelerator restriction: induction variable live-out from loop: i
Accelerator restriction: scalar variable live-out from loop: i0,i1,j,l,l1,r->->
Accelerator restriction: size of the GPU copy of k,k1 is unknown
2901, Accelerator restriction: induction variable live-out from loop: j
2906, Accelerator restriction: induction variable live-out from loop: j
2908, Complex loop carried dependence of a->->,r->->,z->-> prevents parallelization
Accelerator restriction: scalar variable live-out from loop: i0,l,l1,r->->
2910, Loop carried dependence due to exposed use of r[i1+1][i3+1] prevents parallelization
Complex loop carried dependence of a->->,r->->,z->-> prevents parallelization
Accelerator restriction: scalar variable live-out from loop: i0,l1,r->->
Accelerator restriction: size of the GPU copy of a,z is unknown
2912, Accelerator restriction: induction variable live-out from loop: i0
2917, Accelerator restriction: induction variable live-out from loop: i,i0,j,l,l1
2920, Accelerator restriction: induction variable live-out from loop: l1
2921, Accelerator restriction: induction variable live-out from loop: l
2923, Accelerator restriction: induction variable live-out from loop: i,j
2926, Loop carried dependence due to exposed use of r[i1+1][1:i1+6] prevents parallelization
Complex loop carried dependence of a->->,r->->,z->-> prevents parallelization
Accelerator restriction: induction variable live-out from loop: i
Accelerator restriction: scalar variable live-out from loop: i0,i1,j,l,l1,r->->
Accelerator restriction: size of the GPU copy of k,k1 is unknown
2928, Accelerator restriction: induction variable live-out from loop: j
2930, Accelerator restriction: induction variable live-out from loop: j
2931, Complex loop carried dependence of a->->,r->->,z->-> prevents parallelization
Accelerator restriction: scalar variable live-out from loop: i0,l,l1,r->->
2933, Accelerator restriction: induction variable live-out from loop: l
2934, Complex loop carried dependence of a->->,r->->,z->-> prevents parallelization
Parallelization requires privatization of r->-> as well as last value
Accelerator restriction: scalar variable live-out from loop: i0,l1,r->->
Accelerator restriction: size of the GPU copy of a,r is unknown
2936, Accelerator restriction: induction variable live-out from loop: i,i0,j,l,l1
2937, Accelerator restriction: induction variable live-out from loop: i0
2938, Accelerator restriction: induction variable live-out from loop: l1
2939, Accelerator restriction: induction variable live-out from loop: l
2941, Accelerator restriction: induction variable live-out from loop: i,j
2945, Complex loop carried dependence of r->->,s,v->->,z->-> prevents parallelization
Accelerator restriction: scalar variable live-out from loop: a1,e1,e2,i0,j,l,s,z->->
Scalar last value needed after loop for e1 at line 2962
Scalar last value needed after loop for e2 at line 2962
2947, Accelerator restriction: induction variable live-out from loop: j
2948, Complex loop carried dependence of r->->,s,v->-> prevents parallelization
Parallelization requires privatization of s as well as last value
Accelerator restriction: scalar variable live-out from loop: i0,l,s
Accelerator restriction: size of the GPU copy of v is unknown
2950, Accelerator restriction: induction variable live-out from loop: i0
2951, Accelerator restriction: induction variable live-out from loop: i,i0,j,l
2952, Accelerator restriction: induction variable live-out from loop: l
2953, Accelerator restriction: induction variable live-out from loop: j
2954, Accelerator restriction: induction variable live-out from loop: i,j
2956, Accelerator restriction: induction variable live-out from loop: i,j
2957, Accelerator restriction: induction variable live-out from loop: j
2958, Accelerator restriction: induction variable live-out from loop: i
2973, Accelerator restriction: induction variable live-out from loop: i,l
PGC/x86-64 Windows 16.5-0: compilation completed with severe errors
Actuall, I want to parallelize the “i” loop only. Do you have any advice?
Thanks a lot.
Bin