Earlier block does not dominate later block

Hi all,
I am using PGI 14.1 to compile a plain C program using OpenACC directives, to be run on an NVIDIA GPU.

The main program calls several subroutines and most of them have been correctly parallelized, apart from one which let the compiler complains with the error reported in the subject of this post.

The full output for this subroutine is the following:

PGC-S-0053-Illegal use of void type (main.c: 141)
PGC-S-0155-Compiler failed to translate accelerator region (see -Minfo messages): Earlier block does not dominate later block (main.c: 181)
bc:
              181, Generating present(prv[0:])
                   Generating present(nxt[0:])
                   Generating present(param[0:1])
              183, Loop is parallelizable
              185, Loop is parallelizable
                   Accelerator kernel generated
                  183, #pragma acc loop gang /* blockIdx.y */
                  185, #pragma acc loop gang, vector(128) /* blockIdx.x threadIdx.x */
PGC/x86-64 Linux 14.1-0: compilation completed with severe errors
pgcc-Fatal-ccompile completed with exit code 1

I could not find any use of “void type” which could be considered “illegal”, thus I guess the first error may be related to the second one: “Earlier block does not dominate later block”, which on its turn, to my knowledge, is quite cryptic.

Commenting out parts of the code I discovered the “offending” code line is a read to a “global” array, where the index is computed given the loop iteration index.

As an additional information, the same subroutine was successfully run using OpenCL using as kernel the body of the loop I am trying to parallelize with OpenACC; thus it has to be parallelizable.

Does someone knows what could be generating this kind of error?
Or what does it actually means?

Thanks in advance,

Enrico

Hi Enrico,

“Earlier block does not dominate later block” is most likely a compiler error. Can you either post or send to PGI Customer Service (trs@pgroup.com) a reproducing example?

Thanks,
Mat

Yes, here it is…

it seems to make no sense since it is only a simplified portion of the code which was identify to be the cause of the error, but it was a part of a working program.
We already succeeded to modify it in order to help the compiler to “understand” and compile it, but if it may help I post it anyway…

#define NX 1024
#define NY 1024

inline void compute ( double * restrict M) {

  int x, y, site_i, idx3, curY;

  #pragma acc kernels present(M)
  #pragma acc loop independent
  for (x=0; x < NX; x++) {
    #pragma acc loop vector independent
    for (y=16; y < NY; y++) {

      curY = y - 16;

      if ( (curY < 3) || (curY >= (NY-3)) ){

        site_i = x * NY + y;

        if ( curY < 3 ) {

          idx3 = site_i - curY + 3;

          M[ idx3 ] += 1.0;

        }
      }
    }
  }
}

Thanks and Best Regards,

Enrico