Here’s the compiler outputs for both versions. My code does both MAXITER (constant) and maxIter (the integer variable version) in one hence the dual output. The MAXITER (constant) version is first.
normal (ie, pgcc ta=nvidia,cc20 …)
162, Generating copyin(csetre[0:nPoints-1])
Generating copyin(csetim[0:nPoints-1])
Generating compute capability 2.0 binary
163, Loop carried scalar dependence for 'numInside' at line 183
Accelerator kernel generated
163, #pragma acc for parallel, vector(256) /* blockIdx.x threadIdx.x */
CC 2.0 : 15 registers; 1032 shared, 76 constant, 0 local memory bytes; 100% occupancy
183, Sum reduction generated for numInside
171, Loop carried scalar dependence for 'zim' at line 173
Loop carried scalar dependence for 'zim' at line 174
Scalar last value needed after loop for 'zim' at line 182
Loop carried scalar dependence for 'zre' at line 173
Loop carried scalar dependence for 'zre' at line 174
Scalar last value needed after loop for 'zre' at line 182
Accelerator restriction: scalar variable live-out from loop: zre
Accelerator restriction: scalar variable live-out from loop: zim
Inner sequential loop scheduled on accelerator
196, Generating copyin(csetre[0:nPoints-1])
Generating copyin(csetim[0:nPoints-1])
Generating compute capability 2.0 binary
197, Loop carried scalar dependence for 'numInside' at line 217
Accelerator kernel generated
197, #pragma acc for parallel, vector(256) /* blockIdx.x threadIdx.x */
CC 2.0 : 15 registers; 1032 shared, 76 constant, 0 local memory bytes; 100% occupancy
217, Sum reduction generated for numInside
205, Loop carried scalar dependence for 'zim' at line 207
Loop carried scalar dependence for 'zim' at line 208
Scalar last value needed after loop for 'zim' at line 216
Loop carried scalar dependence for 'zre' at line 207
Loop carried scalar dependence for 'zre' at line 208
Scalar last value needed after loop for 'zre' at line 216
Accelerator restriction: scalar variable live-out from loop: zre
Accelerator restriction: scalar variable live-out from loop: zim
Inner sequential loop scheduled on accelerator
And using nofma (pgcc -ta=nvidia,cc20,nofma)
162, Generating copyin(csetre[0:nPoints-1])
Generating copyin(csetim[0:nPoints-1])
Generating compute capability 2.0 binary
163, Loop carried scalar dependence for 'numInside' at line 183
Accelerator kernel generated
163, #pragma acc for parallel, vector(256) /* blockIdx.x threadIdx.x */
CC 2.0 : 19 registers; 1032 shared, 76 constant, 0 local memory bytes; 100% occupancy
183, Sum reduction generated for numInside
171, Loop carried scalar dependence for 'zim' at line 173
Loop carried scalar dependence for 'zim' at line 174
Scalar last value needed after loop for 'zim' at line 182
Loop carried scalar dependence for 'zre' at line 173
Loop carried scalar dependence for 'zre' at line 174
Scalar last value needed after loop for 'zre' at line 182
Accelerator restriction: scalar variable live-out from loop: zre
Accelerator restriction: scalar variable live-out from loop: zim
Inner sequential loop scheduled on accelerator
196, Generating copyin(csetre[0:nPoints-1])
Generating copyin(csetim[0:nPoints-1])
Generating compute capability 2.0 binary
197, Loop carried scalar dependence for 'numInside' at line 217
Accelerator kernel generated
197, #pragma acc for parallel, vector(256) /* blockIdx.x threadIdx.x */
CC 2.0 : 19 registers; 1032 shared, 76 constant, 0 local memory bytes; 100% occupancy
217, Sum reduction generated for numInside
205, Loop carried scalar dependence for 'zim' at line 207
Loop carried scalar dependence for 'zim' at line 208
Scalar last value needed after loop for 'zim' at line 216
Loop carried scalar dependence for 'zre' at line 207
Loop carried scalar dependence for 'zre' at line 208
Scalar last value needed after loop for 'zre' at line 216
Accelerator restriction: scalar variable live-out from loop: zre
Accelerator restriction: scalar variable live-out from loop: zim
Inner sequential loop scheduled on accelerator
-Nick.