Different kernel generation when compiling with -g

LSCH · November 11, 2015, 5:20pm

Hi,

I have a simple function with a switch statement. In each case statement I have a kernels directive followed by a loop independent collapse(2) directive to parallelize a two-level loop nest.

In an optimized build (no -g flag) the loop nests in each case statement are parallelized as expected according to the compiler output.
If I compile with -g (the code does not contain any #ifdef DEBUG statements so we are talking about the identical source code) the compiler parallelizes only the loop nest in the first case statement whereas for the following ones it states

  7977, Generating copyin(xmin,xmax,ymin,ymax,zmin,zmax,boundaryID)
         Generating present(src[:][:],dst[:])
         Generating copyin(this[:])
   7984, Loop is parallelizable
   7985, Loop is parallelizable
         Accelerator kernel generated
         Generating Tesla code
       7984, #pragma acc loop gang, vector(128) collapse(2) /* blockIdx.x threadIdx.x */
       7985,   /* blockIdx.x threadIdx.x collapsed */
   7995, Generating copyin(xmin,xmax,ymin,ymax,zmin,zmax,boundaryID)
         Generating present(src[:][:],dst[:])
         Generating copyin(this[:])
   8002, Conditional loop will be executed in scalar mode
         Accelerator scalar kernel generated
   8013, Generating copyin(xmin,xmax,ymin,ymax,zmin,zmax,boundaryID)
         Generating present(src[:][:],dst[:])
         Generating copyin(this[:])
   8020, Conditional loop will be executed in scalar mode
         Accelerator scalar kernel generated

and fails to parallelize saying it is generating a scalar kernel.
Is this a bug or a feature, i.e. to simplify debugging certain code will not be parallelized? I am confused about this behavior and wonder whether I could be missing something else.

I see the same behavior for PGI 15.7, 15.9, and 15.10.

Thanks,
LS

MatColgrove · November 11, 2015, 6:45pm

Hi LS,

“-g” does inhibit some optimization, though I don’t know why it’s occurring with this particular case. Try “-gopt” instead. “-gopt” adds debug information but doesn’t inhibit optimization.

Otherwise, I’d need a reproducer to understand what’s going on.

Mat

LSCH · November 11, 2015, 10:39pm

Hi Mat,

“-gopt” shows the same kernel generation behavior as without “-g” or “-gopt” so it seems “-g” alone is responsible for suppressing parallelization of the loop nests in the other case statements. Why, this is the case is probably a different question and I can certainly live with using “-gopt”.

Thanks again for your quick response,
LS