I am trying to use OpenACC to parallelise a very simple image processing code that I use as an example on a number of parallel programming courses.
Using “pgcc -acc -Minfo=accel -c dosharpen.c” I get a whle bunch of errors like:
/tmp/pgaccj63cB3vdLOXa.gpu(104): error: expression must have arithmetic or enum type
…
ending in:
19 errors detected in the compilation of “/tmp/pgnvd173cFGXA-4Gj.nv0”.
The loop isn’t parallelised although the compiler analysis is exactly what I would expect:
97, Generating copyin(scale)
Generating copyin(norm)
Generating copyin(sigmad4)
Generating copyin(filter0)
Generating copyin(d4)
Generating copyin(d)
Generating copyin(sharp[0:][0:])
Generating copy(fuzzy[0:][0:])
105, Loop is parallelizable
107, Loop is parallelizable
109, Loop carried dependence of ‘sharp’ prevents parallelization
Loop carried backward dependence of ‘sharp’ prevents vectorization
111, Complex loop carried dependence of ‘sharp’ prevents parallelization
Loop carried dependence of ‘sharp’ prevents parallelization
Loop carried backward dependence of ‘sharp’ prevents vectorization
Inner sequential loop scheduled on accelerator
Accelerator kernel generated
105, #pragma acc loop gang, vector(16) /* blockIdx.y threadIdx.y /
107, #pragma acc loop gang, vector(16) / blockIdx.x threadIdx.x */
109, #pragma acc loop seq
111, #pragma acc loop seq
Any ideas what’s going on - - I get the same errors if I use “parallel” rather than “kernels”?
The loop is appended. Note that the code is quite verbose and inefficient but this is because it is meant to be a training example for beginners.
Thanks,
David
#pragma acc data copy(fuzzy) copyin(sharp, d, d4, filter0, sigmad4, norm, scale)
{
#pragma acc kernels loop private(i, j, k, l, rd4sq, rsq, sigmad4sq, sigmasq,
rsq, delta, filter, filter0)
{
for (i=0; i < nx; i++)
{
for (j=0; j < ny; j++)
{
for (k=-d; k <= d; k++)
{
for (l= -d; l <= d; l++)
{
rd4sq = d4d4;
rsq = dd;
sigmad4sq = sigmad4*sigmad4;
sigmasq = sigmad4sq * (rsq/rd4sq);
rsq = 1.0ii + 1.0jj;
delta = rsq/(2.0*sigmasq);
filter = filter0 * (1.0-delta) * exp(-delta);
sharp_[j] = sharp[j]
- filter*fuzzy[i+d+k][j+d+l];
}
}
}
}
}
}_