Accelerator restriction: invalid loop

sidiqmk · September 21, 2017, 7:15am

Dear all,

I’m a total newbie with openacc… I have this code snippet:

void compute_component_vcell_TL ( real* restrict vptr,
const real* restrict szptr,
const real* restrict sxptr,
const real* restrict syptr,
const real* restrict rho,
const real dt,
const real dzi,
const real dxi,
const real dyi,
const integer nz0,
const integer nzf,
const integer nx0,
const integer nxf,
const integer ny0,
const integer nyf,
const offset_t _SZ,
const offset_t _SX,
const offset_t _SY,
const integer dimmz,
const integer dimmx,
const phase_t phase)
{

#pragma acc kernels
{
#pragma acc loop independent
for(integer y=ny0; y < nyf; y++)
{
#pragma acc loop independent
for(integer x=nx0; x < nxf; x++)
{
#pragma acc loop independent
for(integer z=nz0; z < nzf; z++)
{
const real lrho = rho_TL(rho, z, x, y, dimmz, dimmx);

const real stx = stencil_X( _SX, sxptr, dxi, z, x, y, dimmz, dimmx);
const real sty = stencil_Y( _SY, syptr, dyi, z, x, y, dimmz, dimmx);
const real stz = stencil_Z( _SZ, szptr, dzi, z, x, y, dimmz, dimmx);

vptr[IDX(z,x,y,dimmz,dimmx)] += (stx + sty + stz) * dt * lrho;
}
}
}
}
};

The command :
cmake -DCMAKE_C_COMPILER=/opt/pgi/linux86-64/17.4/bin/pgcc -DCMAKE_BUILD_TYPE=Debug -DUSE_OPENMP=OFF -DUSE_OPENACC=ON …
doesn’t return any error. However when I make, it gives the error:

PGC-S-0155-Unsupported nested compute construct in compute construct or acc routine (/home/fwirtm/FWI-gtc2017/src/fwi_propagator.c: 609)
PGC-S-0155-Accelerator region ignored; see -Minfo messages (/home/fwirtm/FWI-gtc2017/src/fwi_propagator.c)
compute_component_scell_TR:
0, Accelerator region ignored
609, Accelerator restriction: invalid loop

At the end it gives :
PGC/x86-64 Linux 17.4-0: compilation completed with severe errors
src/CMakeFiles/fwi-core.dir/build.make:134: recipe for target ‘src/CMakeFiles/fwi-core.dir/fwi_propagator.c.o’ failed
make[3]: *** [src/CMakeFiles/fwi-core.dir/fwi_propagator.c.o] Error 2
CMakeFiles/Makefile2:87: recipe for target ‘src/CMakeFiles/fwi-core.dir/all’ failed
make[2]: *** [src/CMakeFiles/fwi-core.dir/all] Error 2
CMakeFiles/Makefile2:245: recipe for target ‘main/CMakeFiles/irun.dir/rule’ failed
make[1]: *** [main/CMakeFiles/irun.dir/rule] Error 2
Makefile:157: recipe for target ‘irun’ failed
make: *** [irun] Error 2

I would be grateful for any guidance and assistance. Thank you.

MatColgrove · September 21, 2017, 3:43pm

Hi sidiqmk,

PGC-S-0155-Unsupported nested compute construct in compute construct or acc routine

This error indicates that you are trying to use a compute construct (in this case “kernels”) within an OpenACC device “routine”. Since PGI doesn’t support nested parallelism, this is not allowed.

While not shown here, “compute_component_vcell_TL” must have a “#pragma acc routine …” used someplace such as above the definition or as part of its declaration. Do you want this routine callable from a compute region? If so, then you’ll want to remove “kernels”. Assuming you want this to be a “vector” routine, you can then use “acc loop vector independent collapse(3)” to parallelize the y, x, and z loops. Then it can be called from another compute region’s gang loop.

If you didn’t mean to have this to be a device routine, remove the “acc routine” pragma.

Note that the stencil routines will need to be declared with “acc routine seq” in order to be called.

Hopefully this is clear, but if not, let me know. Though, I’ll probably need the full source to give more specific guidance.

-Mat

MatColgrove · September 21, 2017, 3:43pm

Hi sidiqmk,

PGC-S-0155-Unsupported nested compute construct in compute construct or acc routine

This error indicates that you are trying to use a compute construct (in this case “kernels”) within an OpenACC device “routine”. Since PGI doesn’t support nested parallelism, this is not allowed.

While not shown here, “compute_component_vcell_TL” must have a “#pragma acc routine …” used someplace such as above the definition or as part of its declaration. Do you want this routine callable from a compute region? If so, then you’ll want to remove “kernels”. Assuming you want this to be a “vector” routine, you can then use “acc loop vector independent collapse(3)” to parallelize the y, x, and z loops. Then it can be called from another compute region’s gang loop.

If you didn’t mean to have this to be a device routine, remove the “acc routine” pragma.

Note that the stencil routines will need to be declared with “acc routine seq” in order to be called.

Hopefully this is clear, but if not, let me know. Though, I’ll probably need the full source to give more specific guidance.

-Mat

sidiqmk · September 25, 2017, 12:59am

Dear Mat,

Thank you for your reply. I followed your suggestion and removed #pragma acc kernels and added #pragma acc routine to the top of all functions. Fortunately, the make succeeded and I can proceed with the tutorial that I am following which is obtained from GitHub - Hopobcn/FWI: RTM. The steps that were used can be seen in FWI/GTC2017.md at gtc2017 · Hopobcn/FWI · GitHub. I will continue with the tutorial and post any queries or problems I face.

Thanks.

sidiqmk · September 26, 2017, 10:14am

Dear Mat,

I have included #pragma acc loop vector independent collapse(3) in place of #pragma acc kernels. The make does not throw any errors but I receive the message :
PGC-W-0155-acc loop vector clause ignored in acc routine seq procedure.

If you were to kindly refer to the Github link FWI/GTC2017.md at gtc2017 · Hopobcn/FWI · GitHub, I am trying to reproduce the output obtained from nvprof --cpu-profiling on bin/fwi …/data/fwi_params.txt …/data/fwi_frequencies.profile.txt but in the case of gpus which can be obtained via nvprof bin/fwi …/data/fwi_params.txt …/data/fwi_frequencies.profile.txt as shown in Step 2 in the Github link.

I have included the source code which I have edited with this post. Thanks again for your help.

https://drive.google.com/open?id=0BzQmwGBVpCM2REJmVUZ0bi02VWc

MatColgrove · September 26, 2017, 4:14pm

Hi sidiqmk,

Vector loops can’t be put in sequential routines. The solution here is to change the routine declaration form “acc routine seq” to “acc routine vector” or remove the “loop vector” directive and run the loops sequentially. Note that “routine” defaults to “seq” so “acc routine” is the same as “acc routine seq”.

Note that vector routines can only be called from “gang” or “worker” loops so you should look at where this routine will be called to determine what the best option is.

-Mat

Topic		Replies	Views
Compiling with C++ stdlib Procedures Legacy PGI Compilers	7	9723	January 7, 2015
compiler ask acc routine information for internal function Legacy PGI Compilers	12	20310	October 25, 2017
MatMul with openACC Legacy PGI Compilers	7	13031	December 17, 2012
paralle + independent and kernels + vector_length() Legacy PGI Compilers	5	4032	August 20, 2012
Accelerator restriction: unsupported call to support routine 'memcmp' Legacy PGI Compilers	1	2411	April 30, 2019
Accelerator restriction: unknown right hand side reference ? Legacy PGI Compilers	7	7529	November 29, 2010
Accelerator Fatal Error: No NVIDIA/CUDA version... Legacy PGI Compilers	12	14648	May 15, 2017
acc kernels / acc parallel question Legacy PGI Compilers	2	3857	September 1, 2017
Getting rid of " size of the GPU copy of an array depe Legacy PGI Compilers	20	12335	December 23, 2011
Vector array assignments within a $acc parallel region Legacy PGI Compilers	13	10949	November 27, 2013

Accelerator restriction: invalid loop

Related topics