Hi,
I have a simple function with a switch statement. In each case statement I have a kernels directive followed by a loop independent collapse(2) directive to parallelize a two-level loop nest.
In an optimized build (no -g flag) the loop nests in each case statement are parallelized as expected according to the compiler output.
If I compile with -g (the code does not contain any #ifdef DEBUG statements so we are talking about the identical source code) the compiler parallelizes only the loop nest in the first case statement whereas for the following ones it states
7977, Generating copyin(xmin,xmax,ymin,ymax,zmin,zmax,boundaryID)
Generating present(src[:][:],dst[:])
Generating copyin(this[:])
7984, Loop is parallelizable
7985, Loop is parallelizable
Accelerator kernel generated
Generating Tesla code
7984, #pragma acc loop gang, vector(128) collapse(2) /* blockIdx.x threadIdx.x */
7985, /* blockIdx.x threadIdx.x collapsed */
7995, Generating copyin(xmin,xmax,ymin,ymax,zmin,zmax,boundaryID)
Generating present(src[:][:],dst[:])
Generating copyin(this[:])
8002, Conditional loop will be executed in scalar mode
Accelerator scalar kernel generated
8013, Generating copyin(xmin,xmax,ymin,ymax,zmin,zmax,boundaryID)
Generating present(src[:][:],dst[:])
Generating copyin(this[:])
8020, Conditional loop will be executed in scalar mode
Accelerator scalar kernel generated
and fails to parallelize saying it is generating a scalar kernel.
Is this a bug or a feature, i.e. to simplify debugging certain code will not be parallelized? I am confused about this behavior and wonder whether I could be missing something else.
I see the same behavior for PGI 15.7, 15.9, and 15.10.
Thanks,
LS