Accelerator restriction: invalid loop

Dear all,

I’m a total newbie with openacc… I have this code snippet:

void compute_component_vcell_TL ( real* restrict vptr,
const real* restrict szptr,
const real* restrict sxptr,
const real* restrict syptr,
const real* restrict rho,
const real dt,
const real dzi,
const real dxi,
const real dyi,
const integer nz0,
const integer nzf,
const integer nx0,
const integer nxf,
const integer ny0,
const integer nyf,
const offset_t _SZ,
const offset_t _SX,
const offset_t _SY,
const integer dimmz,
const integer dimmx,
const phase_t phase)
{

#pragma acc kernels
{
#pragma acc loop independent
for(integer y=ny0; y < nyf; y++)
{
#pragma acc loop independent
for(integer x=nx0; x < nxf; x++)
{
#pragma acc loop independent
for(integer z=nz0; z < nzf; z++)
{
const real lrho = rho_TL(rho, z, x, y, dimmz, dimmx);

const real stx = stencil_X( _SX, sxptr, dxi, z, x, y, dimmz, dimmx);
const real sty = stencil_Y( _SY, syptr, dyi, z, x, y, dimmz, dimmx);
const real stz = stencil_Z( _SZ, szptr, dzi, z, x, y, dimmz, dimmx);

vptr[IDX(z,x,y,dimmz,dimmx)] += (stx + sty + stz) * dt * lrho;
}
}
}
}
};

The command :
cmake -DCMAKE_C_COMPILER=/opt/pgi/linux86-64/17.4/bin/pgcc -DCMAKE_BUILD_TYPE=Debug -DUSE_OPENMP=OFF -DUSE_OPENACC=ON …
doesn’t return any error. However when I make, it gives the error:

PGC-S-0155-Unsupported nested compute construct in compute construct or acc routine (/home/fwirtm/FWI-gtc2017/src/fwi_propagator.c: 609)
PGC-S-0155-Accelerator region ignored; see -Minfo messages (/home/fwirtm/FWI-gtc2017/src/fwi_propagator.c)
compute_component_scell_TR:
0, Accelerator region ignored
609, Accelerator restriction: invalid loop

At the end it gives :
PGC/x86-64 Linux 17.4-0: compilation completed with severe errors
src/CMakeFiles/fwi-core.dir/build.make:134: recipe for target ‘src/CMakeFiles/fwi-core.dir/fwi_propagator.c.o’ failed
make[3]: *** [src/CMakeFiles/fwi-core.dir/fwi_propagator.c.o] Error 2
CMakeFiles/Makefile2:87: recipe for target ‘src/CMakeFiles/fwi-core.dir/all’ failed
make[2]: *** [src/CMakeFiles/fwi-core.dir/all] Error 2
CMakeFiles/Makefile2:245: recipe for target ‘main/CMakeFiles/irun.dir/rule’ failed
make[1]: *** [main/CMakeFiles/irun.dir/rule] Error 2
Makefile:157: recipe for target ‘irun’ failed
make: *** [irun] Error 2

I would be grateful for any guidance and assistance. Thank you.

Hi sidiqmk,

PGC-S-0155-Unsupported nested compute construct in compute construct or acc routine

This error indicates that you are trying to use a compute construct (in this case “kernels”) within an OpenACC device “routine”. Since PGI doesn’t support nested parallelism, this is not allowed.

While not shown here, “compute_component_vcell_TL” must have a “#pragma acc routine …” used someplace such as above the definition or as part of its declaration. Do you want this routine callable from a compute region? If so, then you’ll want to remove “kernels”. Assuming you want this to be a “vector” routine, you can then use “acc loop vector independent collapse(3)” to parallelize the y, x, and z loops. Then it can be called from another compute region’s gang loop.

If you didn’t mean to have this to be a device routine, remove the “acc routine” pragma.

Note that the stencil routines will need to be declared with “acc routine seq” in order to be called.

Hopefully this is clear, but if not, let me know. Though, I’ll probably need the full source to give more specific guidance.

-Mat

Hi sidiqmk,

PGC-S-0155-Unsupported nested compute construct in compute construct or acc routine

This error indicates that you are trying to use a compute construct (in this case “kernels”) within an OpenACC device “routine”. Since PGI doesn’t support nested parallelism, this is not allowed.

While not shown here, “compute_component_vcell_TL” must have a “#pragma acc routine …” used someplace such as above the definition or as part of its declaration. Do you want this routine callable from a compute region? If so, then you’ll want to remove “kernels”. Assuming you want this to be a “vector” routine, you can then use “acc loop vector independent collapse(3)” to parallelize the y, x, and z loops. Then it can be called from another compute region’s gang loop.

If you didn’t mean to have this to be a device routine, remove the “acc routine” pragma.

Note that the stencil routines will need to be declared with “acc routine seq” in order to be called.

Hopefully this is clear, but if not, let me know. Though, I’ll probably need the full source to give more specific guidance.

-Mat

Dear Mat,

Thank you for your reply. I followed your suggestion and removed #pragma acc kernels and added #pragma acc routine to the top of all functions. Fortunately, the make succeeded and I can proceed with the tutorial that I am following which is obtained from https://github.com/Hopobcn/FWI. The steps that were used can be seen in https://github.com/Hopobcn/FWI/blob/gtc2017/GTC2017.md. I will continue with the tutorial and post any queries or problems I face.

Thanks.

Dear Mat,

I have included #pragma acc loop vector independent collapse(3) in place of #pragma acc kernels. The make does not throw any errors but I receive the message :
PGC-W-0155-acc loop vector clause ignored in acc routine seq procedure.

If you were to kindly refer to the Github link https://github.com/Hopobcn/FWI/blob/gtc2017/GTC2017.md, I am trying to reproduce the output obtained from nvprof --cpu-profiling on bin/fwi …/data/fwi_params.txt …/data/fwi_frequencies.profile.txt but in the case of gpus which can be obtained via nvprof bin/fwi …/data/fwi_params.txt …/data/fwi_frequencies.profile.txt as shown in Step 2 in the Github link.

I have included the source code which I have edited with this post. Thanks again for your help.

https://drive.google.com/open?id=0BzQmwGBVpCM2REJmVUZ0bi02VWc

Hi sidiqmk,

Vector loops can’t be put in sequential routines. The solution here is to change the routine declaration form “acc routine seq” to “acc routine vector” or remove the “loop vector” directive and run the loops sequentially. Note that “routine” defaults to “seq” so “acc routine” is the same as “acc routine seq”.

Note that vector routines can only be called from “gang” or “worker” loops so you should look at where this routine will be called to determine what the best option is.

-Mat