compiler output

167, Generating compute capability 1.3 binary
Generating compute capability 2.0 binary
168, Accelerator restriction: size of the GPU copy of ‘PWRSUM’ is unknown
Accelerator restriction: size of the GPU copy of ‘PHSSUM’ is unknown
Complex loop carried dependence of ‘(PWRSUM)’ prevents parallelization
Loop carried dependence of '
(PHSSUM)’ prevents parallelization
Loop carried backward dependence of ‘*(PHSSUM)’ prevents vectorization
Conditional loop will be executed in scalar mode
Accelerator kernel generated
168, CC 1.3 : 6 registers; 40 shared, 0 constant, 0 local memory bytes
CC 2.0 : 6 registers; 0 shared, 56 constant, 0 local memory bytes
Generating copyout(PWRSUM[0:])
Generating copyout(PHSSUM[0:])
178, Accelerator region ignored
179, Accelerator restriction: invalid loop
203, Accelerator restriction: size of the GPU copy of ‘Env’ is unknown
Accelerator restriction: invalid loop
204, Accelerator restriction: datatype not supported: SCST
238, Accelerator restriction: size of the GPU copy of ‘URAYu’ is unknown
Accelerator restriction: size of the GPU copy of ‘FRAYu’ is unknown
288, Accelerator restriction: size of the GPU copy of ‘PWRSUM’ is unknown
291, Accelerator restriction: size of the GPU copy of ‘PHSSUM’ is unknown
311, Accelerator restriction: size of the GPU copy of ‘PHSSUM’ is unknown

In the compiler output shown above, I am not sure what it is saying to me. For instance in line 178 it says “accelerator region ignored”, and line 179 " loop is invalid". Why? This code compiles. So how could it be an invalid loop? Why was accel region ignored in line 178.?

In the earlier comment, line 168, it talks about preventing parallelization. It still generates an accelerator kernel?

Thanks in advance.

THX 1138

Okay, I think I see the answer to the first question. The compiler generates a sequential kernel in this case not a parallel kernel. I get it. Maybe it should say the type of kernel instead of just saying kernel generated.

I still need help on the other question, however.

Thanks in advance.

Newport_j

Hi Newport_j,

Accelerator restriction: size of the GPU copy of ‘PHSSUM’ is unknown

This one is hopefully self explanatory. The compiler can’t tell how big the array is so yo need to use one of the copy clauses to explicitly indicate how much to copy over to the GPU.

Complex loop carried dependence of ‘(PWRSUM)’ prevents parallelization
Loop carried dependence of '
(PHSSUM)’ prevents parallelization

If you have two pointers in C, it’s possible that they point at the same memory or overlap in memory. Overlapping memory prevents parallelization. However, the compiler can’t tell at compile time if the memory overlaps so must assume it does. The C99 “restrict” keyword should be used to assert that the pointers don’t overlap, or you can use the flag “-Msafeptr” to declare that all pointers are independent.

Other possible reasons for this message are that you’re using a calculated index, i.e. “idx = getIndex(i,j); PWRSUM(idx) = … ;”. In these cases, the compiler must assume that all values of the computed index are the same, and hence the code is not parallel. In this case you need to add the “#pragma acc loop independent” directive to assert to the compiler the code is parallel. Note that is is only required for the “kernels” model. The “parallel” model assumes loops are independent.

Another reason could be that there really is a loop carried dependency, in which case, you’ll need to change your code. Though, I’d need a code snip-it to tell.

For instance in line 178 it says “accelerator region ignored”, and line 179 " loop is invalid". Why?

I’d need a code snipit to tell. But things like “while” loops, or an implicit shallow copy of a struct would get this message.

It still generates an accelerator kernel?

If you’re using the “parallel” model, a kernel will be generated.

  • Mat