I am new to parallel programming. I have a legacy FORTRAN-77
code (about 15,000 lines) that models eclipsing binary stars.
The stars are divided up into rectangular grids in “longitude”
and “latitude”, and much of the computing time is spent
looping over the grid(s) and computing various things. Since the
GPUs don’t support subroutine and function calls, I have been
starting to manually inline subroutine calls.
This section of the code computes various physical quantities for
each pixel on the star, and saves the results into several arrays.
Each pixel is independent of every other pixel, so I would think
loops like these should be able to run in parallel. At
each pixel, there is a Newton-Raphson iteration to find the radius
vector.
!$acc region
do 1104 ialf=1,Nalph
r=0.0000001d0
theta=-0.5d0dtheta+dthetadble(ialf)
dphi=twopie/dble(4Nbet) !dble(ibetlim(ialf))
snth=dsin(theta)
snth3=dsin(theta)/3.0d0
cnth=dcos(theta)
DO 1105 ibet=1, 4Nbet
iidx=mmdx(ialf,ibet)
phi=-0.5d0dphi+dphidble(ibet)
phi=phi+phistart(ialf)
cox=dcos(phi)snth
coy=dsin(phi)snth
coz=cnth
c begin in-line subroutine rad
do irad=1,190 !Newton-Raphson iteration
x=rcox
y=rcoy
z=rcoz
if(itide.lt.2)then !in-line subroutine spherepot
t1=(bdistbdist-2.0d0coxrbdist+rr)
t2=0.5d0omegaomega*(1.0d0+overQ)(1.0d0-cozcoz)
psi=1.0d0/r+overQ*(1.0d0/dsqrt(t1)-coxr/(bdistbdist))+rrt2
dpsidr=-1.0d0/(rr)+overQ((dsqrt(t1)3)(coxbdist-r)
% -cox/(bdistbdist))+t22.0d0*r
endif !end spherepot
rnew=r-(psi-psi0)/dpsidr
dr=dabs(rnew-r)
if(dabs(dr).lt.1.0d-19)go to 4115
r=rnew
enddo
4115 continue
if(itide.lt.2)then !in-line subroutine poten
RST = DSQRT(X2 + Y2 + Z2)
RX = DSQRT((X-bdist)2 + Y2 + Z2)
A = ((1.0d0+overQ)/2.0d0) * OMEGA2
RST3 = RSTRSTRST
RX3 = RXRXRX
PSI = 1.0d0/RST + overQ/RX - overQX/bdist/bdist
& + A(X2 + Y2)
PSIY = -Y/RST3 - overQY/RX3 + 2.0d0AY
PSIZ = -Z/RST3 - overQZ/RX3
PSIX = -X/RST3 - overQ*(X-bdist)/RX3 -overQ/bdist/bdist
& + 2.0d0AX
RST5 = RST3RSTRST
RX5 = RX3RXRX
PSIXX = -1.0d0/RST3 + 3.0d0X**2/RST5
$ -overQ/RX3 + (3.0d0Q*(X-bdist)2)/RX5 +2.0d0*A
endif !end poten !end rad
radarray(iidx) = R
garray(iidx) = DSQRT(PSIX2+PSIY2+PSIZ2)
oneoverg=1.0d0/garray(iidx)
GRADX(iidx) = -PSIXoneoverg
GRADY(iidx) = -PSIYoneoverg
GRADZ(iidx) = -PSIZoneoverg
surf(iidx) = COXGRADX(iidx)+COYGRADY(iidx)
$ + COZGRADZ(iidx)
if(surf(iidx).lt.0.7d0)surf(iidx)=0.7d0
surf(iidx) = R**2 / surf(iidx)
surf(iidx)=surf(iidx)dphidthetasnth
xarray(iidx)=x
yarray(iidx)=y
zarray(iidx)=z
sarea=sarea+surf(iidx)
VOL = VOL + 1.0d0RRRdphidtheta*snth3
1105 CONTINUE ! continue ialf loop
1104 CONTINUE ! continue over ibet
!$acc end region
!$acc end data region
I compile this command:
pgfortran -Mextend -O2 -o ELC ELC.for -ta=nvidia -Minfo
I get a lengthy list of messages:
5035, No parallel kernels found, accelerator region ignored
5046, Accelerator restriction: induction variable live-out from loop: ialf
5051, Loop carried scalar dependence for ‘rad1_psi’ at line 5123
Loop carried scalar dependence for ‘rad1_dpsidr’ at line 5123
Loop carried scalar dependence for ‘rad1_r’ at line 5066
Loop carried scalar dependence for ‘rad1_r’ at line 5068
Loop carried scalar dependence for ‘rad1_r’ at line 5070
Loop carried scalar dependence for ‘rad1_r’ at line 5123
Loop carried scalar dependence for ‘rad1_r’ at line 5124
5061, Loop carried scalar dependence for ‘rad1_psi’ at line 5123
Loop carried scalar dependence for ‘rad1_dpsidr’ at line 5123
Loop carried scalar dependence for ‘rad1_r’ at line 5066
Loop carried scalar dependence for ‘rad1_r’ at line 5068
Loop carried scalar dependence for ‘rad1_r’ at line 5070
Loop carried scalar dependence for ‘rad1_r’ at line 5123
Loop carried scalar dependence for ‘rad1_r’ at line 5124
5131, Accelerator restriction: induction variable live-out from loop: ialf
5187, No parallel kernels found, accelerator region ignored
5188, Loop carried scalar dependence for ‘psi’ at line 5261
Loop carried scalar dependence for ‘psiy’ at line 5342
Loop carried scalar dependence for ‘psiy’ at line 5345
Loop carried scalar dependence for ‘psiz’ at line 5342
Loop carried scalar dependence for ‘psiz’ at line 5346
Loop carried scalar dependence for ‘psix’ at line 5342
Loop carried scalar dependence for ‘psix’ at line 5344
Complex loop carried dependence of ‘radarray’ prevents parallelization
Complex loop carried dependence of ‘garray’ prevents parallelization
Complex loop carried dependence of ‘gradx’ prevents parallelization
Complex loop carried dependence of ‘grady’ prevents parallelization
Complex loop carried dependence of ‘gradz’ prevents parallelization
Complex loop carried dependence of ‘surf’ prevents parallelization
Complex loop carried dependence of ‘xarray’ prevents parallelization
Complex loop carried dependence of ‘yarray’ prevents parallelization
Complex loop carried dependence of ‘zarray’ prevents parallelization
Scalar last value needed after loop for ‘z’ at line 5475
Loop carried scalar dependence for ‘dpsidr’ at line 5261
Accelerator restriction: scalar variable live-out from loop: r
Accelerator restriction: scalar variable live-out from loop: psi
Accelerator restriction: scalar variable live-out from loop: z
Accelerator restriction: scalar variable live-out from loop: y
Accelerator restriction: scalar variable live-out from loop: x
Accelerator restriction: scalar variable live-out from loop: psixx
Accelerator restriction: scalar variable live-out from loop: psix
Accelerator restriction: scalar variable live-out from loop: psiz
Accelerator restriction: scalar variable live-out from loop: psiy
Accelerator restriction: scalar variable live-out from loop: coz
Accelerator restriction: scalar variable live-out from loop: coy
Accelerator restriction: scalar variable live-out from loop: cox
5190, Accelerator restriction: induction variable live-out from loop: ialf
5195, Loop carried scalar dependence for ‘psi’ at line 5261
Loop carried scalar dependence for ‘psiy’ at line 5342
Loop carried scalar dependence for ‘psiy’ at line 5345
Loop carried scalar dependence for ‘psiz’ at line 5342
Loop carried scalar dependence for ‘psiz’ at line 5346
Loop carried scalar dependence for ‘psix’ at line 5342
Loop carried scalar dependence for ‘psix’ at line 5344
Complex loop carried dependence of ‘radarray’ prevents parallelization
Complex loop carried dependence of ‘garray’ prevents parallelization
Loop carried dependence due to exposed use of ‘garray(:)’ prevents parallelization
Complex loop carried dependence of ‘gradx’ prevents parallelization
Loop carried dependence due to exposed use of ‘gradx(:)’ prevents parallelization
Complex loop carried dependence of ‘grady’ prevents parallelization
Loop carried dependence due to exposed use of ‘grady(:)’ prevents parallelization
Complex loop carried dependence of ‘gradz’ prevents parallelization
Loop carried dependence due to exposed use of ‘gradz(:)’ prevents parallelization
Complex loop carried dependence of ‘surf’ prevents parallelization
Loop carried dependence due to exposed use of ‘surf(:)’ prevents parallelization
Complex loop carried dependence of ‘xarray’ prevents parallelization
Complex loop carried dependence of ‘yarray’ prevents parallelization
Complex loop carried dependence of ‘zarray’ prevents parallelization
Scalar last value needed after loop for ‘z’ at line 5475
Loop carried scalar dependence for ‘dpsidr’ at line 5261
Loop carried scalar dependence for ‘r’ at line 5210
Loop carried scalar dependence for ‘r’ at line 5211
Loop carried scalar dependence for ‘r’ at line 5212
Loop carried scalar dependence for ‘r’ at line 5214
Loop carried scalar dependence for ‘r’ at line 5216
Loop carried scalar dependence for ‘r’ at line 5217
Loop carried scalar dependence for ‘r’ at line 5261
Loop carried scalar dependence for ‘r’ at line 5262
Loop carried scalar dependence for ‘r’ at line 5341
Loop carried scalar dependence for ‘r’ at line 5357
Loop carried scalar dependence for ‘r’ at line 5375
Accelerator restriction: scalar variable live-out from loop: r
Accelerator restriction: scalar variable live-out from loop: psi
Accelerator restriction: scalar variable live-out from loop: z
Accelerator restriction: scalar variable live-out from loop: y
Accelerator restriction: scalar variable live-out from loop: x
Accelerator restriction: scalar variable live-out from loop: psixx
Accelerator restriction: scalar variable live-out from loop: psix
Accelerator restriction: scalar variable live-out from loop: psiz
Accelerator restriction: scalar variable live-out from loop: psiy
Accelerator restriction: scalar variable live-out from loop: coz
Accelerator restriction: scalar variable live-out from loop: coy
Accelerator restriction: scalar variable live-out from loop: cox
Parallelization would require privatization of array ‘zarray(:)’
Parallelization would require privatization of array ‘yarray(:)’
Parallelization would require privatization of array ‘xarray(:)’
Parallelization would require privatization of array ‘radarray(:)’
Invariant if transformation
5196, Accelerator restriction: induction variable live-out from loop: ialf
5209, Scalar last value needed after loop for ‘x’ at line 5268
Scalar last value needed after loop for ‘x’ at line 5269
Scalar last value needed after loop for ‘x’ at line 5273
Scalar last value needed after loop for ‘x’ at line 5277
Scalar last value needed after loop for ‘x’ at line 5281
Scalar last value needed after loop for ‘x’ at line 5349
Scalar last value needed after loop for ‘y’ at line 5268
Scalar last value needed after loop for ‘y’ at line 5269
Scalar last value needed after loop for ‘y’ at line 5273
Scalar last value needed after loop for ‘y’ at line 5275
Scalar last value needed after loop for ‘y’ at line 5350
Scalar last value needed after loop for ‘z’ at line 5268
Scalar last value needed after loop for ‘z’ at line 5269
Scalar last value needed after loop for ‘z’ at line 5276
Scalar last value needed after loop for ‘z’ at line 5351
Scalar last value needed after loop for ‘z’ at line 5475
Loop carried scalar dependence for ‘psi’ at line 5261
Loop carried scalar dependence for ‘dpsidr’ at line 5261
Loop carried scalar dependence for ‘r’ at line 5210
Loop carried scalar dependence for ‘r’ at line 5211
Loop carried scalar dependence for ‘r’ at line 5212
Loop carried scalar dependence for ‘r’ at line 5214
Loop carried scalar dependence for ‘r’ at line 5216
Loop carried scalar dependence for ‘r’ at line 5217
Loop carried scalar dependence for ‘r’ at line 5261
Loop carried scalar dependence for ‘r’ at line 5262
Scalar last value needed after loop for ‘r’ at line 5341
Scalar last value needed after loop for ‘r’ at line 5357
Scalar last value needed after loop for ‘r’ at line 5375
Accelerator restriction: scalar variable live-out from loop: r
Accelerator restriction: scalar variable live-out from loop: psi
Accelerator restriction: scalar variable live-out from loop: z
Accelerator restriction: scalar variable live-out from loop: y
Accelerator restriction: scalar variable live-out from loop: x
I suspect I need to the arrays like gradx, grady, need to be saved. However, I
don’t understand the complex dependencies, like why this error occurs:
Accelerator restriction: induction variable live-out from loop: ialf
The value of the latitude of the pixel (theta) is set from the index.
The longitude (phi) is also set from an index, but the compiler does
not care about that.
In this code, I need to compute values for various pixels and save them in arrays
in a few other places, so I would appreciate any advice on how this can be
done in parallel.
I am using pgfortran 10.3. We have a Linux box with a quadcore i7 and two
Nvidia C1060 cards.
Thanks,
Jerry
P.S. upon previewing, the spacing in the code snippet is messed up.
Hopefully it is still understandable.