I am new to parallel programming. I have a legacy FORTRAN-77

code (about 15,000 lines) that models eclipsing binary stars.

The stars are divided up into rectangular grids in “longitude”

and “latitude”, and much of the computing time is spent

looping over the grid(s) and computing various things. Since the

GPUs don’t support subroutine and function calls, I have been

starting to manually inline subroutine calls.

This section of the code computes various physical quantities for

each pixel on the star, and saves the results into several arrays.

Each pixel is independent of every other pixel, so I would think

loops like these should be able to run in parallel. At

each pixel, there is a Newton-Raphson iteration to find the radius

vector.

!$acc region

do 1104 ialf=1,Nalph

r=0.0000001d0

theta=-0.5d0*dtheta+dtheta*dble(ialf)

dphi=twopie/dble(4*Nbet) !dble(ibetlim(ialf))
snth=dsin(theta)
snth3=dsin(theta)/3.0d0
cnth=dcos(theta)
DO 1105 ibet=1, 4*Nbet

iidx=mmdx(ialf,ibet)

phi=-0.5d0

*dphi+dphi*dble(ibet)

phi=phi+phistart(ialf)

cox=dcos(phi)

*snth*

coy=dsin(phi)

y=rcoy

coy=dsin(phi)

*snth*

coz=cnth

c begin in-line subroutine rad

do irad=1,190 !Newton-Raphson iteration

x=rcoxcoz=cnth

c begin in-line subroutine rad

do irad=1,190 !Newton-Raphson iteration

x=r

y=r

z=r

*coz*

if(itide.lt.2)then !in-line subroutine spherepot

t1=(bdistbdist-2.0d0

if(itide.lt.2)then !in-line subroutine spherepot

t1=(bdist

*cox*r

*bdist+r*r)

t2=0.5d0

*omega*omega*(1.0d0+overQ)

*(1.0d0-coz*coz)

psi=1.0d0/r+overQ*(1.0d0/dsqrt(t1)-cox

*r/(bdist*bdist))+r

*r*t2

dpsidr=-1.0d0/(r

*r)+overQ*((dsqrt(t1)

**3)**

% -cox/(bdist

endif !end spherepot

rnew=r-(psi-psi0)/dpsidr

dr=dabs(rnew-r)

if(dabs(dr).lt.1.0d-19)go to 4115

r=rnew

enddo

4115 continue

if(itide.lt.2)then !in-line subroutine poten

RST = DSQRT(X2 + Y

*(cox*bdist-r)% -cox/(bdist

*bdist))+t2*2.0d0*rendif !end spherepot

rnew=r-(psi-psi0)/dpsidr

dr=dabs(rnew-r)

if(dabs(dr).lt.1.0d-19)go to 4115

r=rnew

enddo

4115 continue

if(itide.lt.2)then !in-line subroutine poten

RST = DSQRT(X

**2 + Z**2)

RX = DSQRT((X-bdist)

**2 + Y**2 + Z

**2)**

A = ((1.0d0+overQ)/2.0d0) * OMEGA2

A = ((1.0d0+overQ)/2.0d0) * OMEGA

RST3 = RST

*RST*RST

RX3 = RX

*RX*RX

PSI = 1.0d0/RST + overQ/RX - overQ

*X/bdist/bdist*

& + A(X

& + A

**2 + Y**2)

PSIY = -Y/RST3 - overQ

*Y/RX3 + 2.0d0*A

*Y*

PSIZ = -Z/RST3 - overQZ/RX3

PSIZ = -Z/RST3 - overQ

PSIX = -X/RST3 - overQ*(X-bdist)/RX3 -overQ/bdist/bdist

& + 2.0d0

*A*X

RST5 = RST3

*RST*RST

RX5 = RX3

*RX*RX

PSIXX = -1.0d0/RST3 + 3.0d0

*X**2/RST5*

$ -overQ/RX3 + (3.0d0Q*(X-bdist)

$ -overQ/RX3 + (3.0d0

**2)/RX5 +2.0d0*A**

endif !end poten !end rad

radarray(iidx) = R

garray(iidx) = DSQRT(PSIX2+PSIY

endif !end poten !end rad

radarray(iidx) = R

garray(iidx) = DSQRT(PSIX

**2+PSIZ**2)

oneoverg=1.0d0/garray(iidx)

GRADX(iidx) = -PSIX

*oneoverg*

GRADY(iidx) = -PSIYoneoverg

GRADY(iidx) = -PSIY

GRADZ(iidx) = -PSIZ

*oneoverg*

surf(iidx) = COXGRADX(iidx)+COY

surf(iidx) = COX

*GRADY(iidx)*

$ + COZGRADZ(iidx)

$ + COZ

if(surf(iidx).lt.0.7d0)surf(iidx)=0.7d0

surf(iidx) = R**2 / surf(iidx)

surf(iidx)=surf(iidx)

*dphi*dtheta

*snth*

xarray(iidx)=x

yarray(iidx)=y

zarray(iidx)=z

sarea=sarea+surf(iidx)

VOL = VOL + 1.0d0R

xarray(iidx)=x

yarray(iidx)=y

zarray(iidx)=z

sarea=sarea+surf(iidx)

VOL = VOL + 1.0d0

*R*R

*dphi*dtheta*snth3

1105 CONTINUE ! continue ialf loop

1104 CONTINUE ! continue over ibet

!$acc end region

!$acc end data region

I compile this command:

pgfortran -Mextend -O2 -o ELC ELC.for -ta=nvidia -Minfo

I get a lengthy list of messages:

5035, No parallel kernels found, accelerator region ignored

5046, Accelerator restriction: induction variable live-out from loop: ialf

5051, Loop carried scalar dependence for ‘rad1_psi’ at line 5123

Loop carried scalar dependence for ‘rad1_dpsidr’ at line 5123

Loop carried scalar dependence for ‘rad1_r’ at line 5066

Loop carried scalar dependence for ‘rad1_r’ at line 5068

Loop carried scalar dependence for ‘rad1_r’ at line 5070

Loop carried scalar dependence for ‘rad1_r’ at line 5123

Loop carried scalar dependence for ‘rad1_r’ at line 5124

5061, Loop carried scalar dependence for ‘rad1_psi’ at line 5123

Loop carried scalar dependence for ‘rad1_dpsidr’ at line 5123

Loop carried scalar dependence for ‘rad1_r’ at line 5066

Loop carried scalar dependence for ‘rad1_r’ at line 5068

Loop carried scalar dependence for ‘rad1_r’ at line 5070

Loop carried scalar dependence for ‘rad1_r’ at line 5123

Loop carried scalar dependence for ‘rad1_r’ at line 5124

5131, Accelerator restriction: induction variable live-out from loop: ialf

5187, No parallel kernels found, accelerator region ignored

5188, Loop carried scalar dependence for ‘psi’ at line 5261

Loop carried scalar dependence for ‘psiy’ at line 5342

Loop carried scalar dependence for ‘psiy’ at line 5345

Loop carried scalar dependence for ‘psiz’ at line 5342

Loop carried scalar dependence for ‘psiz’ at line 5346

Loop carried scalar dependence for ‘psix’ at line 5342

Loop carried scalar dependence for ‘psix’ at line 5344

Complex loop carried dependence of ‘radarray’ prevents parallelization

Complex loop carried dependence of ‘garray’ prevents parallelization

Complex loop carried dependence of ‘gradx’ prevents parallelization

Complex loop carried dependence of ‘grady’ prevents parallelization

Complex loop carried dependence of ‘gradz’ prevents parallelization

Complex loop carried dependence of ‘surf’ prevents parallelization

Complex loop carried dependence of ‘xarray’ prevents parallelization

Complex loop carried dependence of ‘yarray’ prevents parallelization

Complex loop carried dependence of ‘zarray’ prevents parallelization

Scalar last value needed after loop for ‘z’ at line 5475

Loop carried scalar dependence for ‘dpsidr’ at line 5261

Accelerator restriction: scalar variable live-out from loop: r

Accelerator restriction: scalar variable live-out from loop: psi

Accelerator restriction: scalar variable live-out from loop: z

Accelerator restriction: scalar variable live-out from loop: y

Accelerator restriction: scalar variable live-out from loop: x

Accelerator restriction: scalar variable live-out from loop: psixx

Accelerator restriction: scalar variable live-out from loop: psix

Accelerator restriction: scalar variable live-out from loop: psiz

Accelerator restriction: scalar variable live-out from loop: psiy

Accelerator restriction: scalar variable live-out from loop: coz

Accelerator restriction: scalar variable live-out from loop: coy

Accelerator restriction: scalar variable live-out from loop: cox

5190, Accelerator restriction: induction variable live-out from loop: ialf

5195, Loop carried scalar dependence for ‘psi’ at line 5261

Loop carried scalar dependence for ‘psiy’ at line 5342

Loop carried scalar dependence for ‘psiy’ at line 5345

Loop carried scalar dependence for ‘psiz’ at line 5342

Loop carried scalar dependence for ‘psiz’ at line 5346

Loop carried scalar dependence for ‘psix’ at line 5342

Loop carried scalar dependence for ‘psix’ at line 5344

Complex loop carried dependence of ‘radarray’ prevents parallelization

Complex loop carried dependence of ‘garray’ prevents parallelization

Loop carried dependence due to exposed use of ‘garray(:)’ prevents parallelization

Complex loop carried dependence of ‘gradx’ prevents parallelization

Loop carried dependence due to exposed use of ‘gradx(:)’ prevents parallelization

Complex loop carried dependence of ‘grady’ prevents parallelization

Loop carried dependence due to exposed use of ‘grady(:)’ prevents parallelization

Complex loop carried dependence of ‘gradz’ prevents parallelization

Loop carried dependence due to exposed use of ‘gradz(:)’ prevents parallelization

Complex loop carried dependence of ‘surf’ prevents parallelization

Loop carried dependence due to exposed use of ‘surf(:)’ prevents parallelization

Complex loop carried dependence of ‘xarray’ prevents parallelization

Complex loop carried dependence of ‘yarray’ prevents parallelization

Complex loop carried dependence of ‘zarray’ prevents parallelization

Scalar last value needed after loop for ‘z’ at line 5475

Loop carried scalar dependence for ‘dpsidr’ at line 5261

Loop carried scalar dependence for ‘r’ at line 5210

Loop carried scalar dependence for ‘r’ at line 5211

Loop carried scalar dependence for ‘r’ at line 5212

Loop carried scalar dependence for ‘r’ at line 5214

Loop carried scalar dependence for ‘r’ at line 5216

Loop carried scalar dependence for ‘r’ at line 5217

Loop carried scalar dependence for ‘r’ at line 5261

Loop carried scalar dependence for ‘r’ at line 5262

Loop carried scalar dependence for ‘r’ at line 5341

Loop carried scalar dependence for ‘r’ at line 5357

Loop carried scalar dependence for ‘r’ at line 5375

Accelerator restriction: scalar variable live-out from loop: r

Accelerator restriction: scalar variable live-out from loop: psi

Accelerator restriction: scalar variable live-out from loop: z

Accelerator restriction: scalar variable live-out from loop: y

Accelerator restriction: scalar variable live-out from loop: x

Accelerator restriction: scalar variable live-out from loop: psixx

Accelerator restriction: scalar variable live-out from loop: psix

Accelerator restriction: scalar variable live-out from loop: psiz

Accelerator restriction: scalar variable live-out from loop: psiy

Accelerator restriction: scalar variable live-out from loop: coz

Accelerator restriction: scalar variable live-out from loop: coy

Accelerator restriction: scalar variable live-out from loop: cox

Parallelization would require privatization of array ‘zarray(:)’

Parallelization would require privatization of array ‘yarray(:)’

Parallelization would require privatization of array ‘xarray(:)’

Parallelization would require privatization of array ‘radarray(:)’

Invariant if transformation

5196, Accelerator restriction: induction variable live-out from loop: ialf

5209, Scalar last value needed after loop for ‘x’ at line 5268

Scalar last value needed after loop for ‘x’ at line 5269

Scalar last value needed after loop for ‘x’ at line 5273

Scalar last value needed after loop for ‘x’ at line 5277

Scalar last value needed after loop for ‘x’ at line 5281

Scalar last value needed after loop for ‘x’ at line 5349

Scalar last value needed after loop for ‘y’ at line 5268

Scalar last value needed after loop for ‘y’ at line 5269

Scalar last value needed after loop for ‘y’ at line 5273

Scalar last value needed after loop for ‘y’ at line 5275

Scalar last value needed after loop for ‘y’ at line 5350

Scalar last value needed after loop for ‘z’ at line 5268

Scalar last value needed after loop for ‘z’ at line 5269

Scalar last value needed after loop for ‘z’ at line 5276

Scalar last value needed after loop for ‘z’ at line 5351

Scalar last value needed after loop for ‘z’ at line 5475

Loop carried scalar dependence for ‘psi’ at line 5261

Loop carried scalar dependence for ‘dpsidr’ at line 5261

Loop carried scalar dependence for ‘r’ at line 5210

Loop carried scalar dependence for ‘r’ at line 5211

Loop carried scalar dependence for ‘r’ at line 5212

Loop carried scalar dependence for ‘r’ at line 5214

Loop carried scalar dependence for ‘r’ at line 5216

Loop carried scalar dependence for ‘r’ at line 5217

Loop carried scalar dependence for ‘r’ at line 5261

Loop carried scalar dependence for ‘r’ at line 5262

Scalar last value needed after loop for ‘r’ at line 5341

Scalar last value needed after loop for ‘r’ at line 5357

Scalar last value needed after loop for ‘r’ at line 5375

Accelerator restriction: scalar variable live-out from loop: r

Accelerator restriction: scalar variable live-out from loop: psi

Accelerator restriction: scalar variable live-out from loop: z

Accelerator restriction: scalar variable live-out from loop: y

Accelerator restriction: scalar variable live-out from loop: x

I suspect I need to the arrays like gradx, grady, need to be saved. However, I

don’t understand the complex dependencies, like why this error occurs:

Accelerator restriction: induction variable live-out from loop: ialf

The value of the latitude of the pixel (theta) is set from the index.

The longitude (phi) is also set from an index, but the compiler does

not care about that.

In this code, I need to compute values for various pixels and save them in arrays

in a few other places, so I would appreciate any advice on how this can be

done in parallel.

I am using pgfortran 10.3. We have a Linux box with a quadcore i7 and two

Nvidia C1060 cards.

Thanks,

Jerry

P.S. upon previewing, the spacing in the code snippet is messed up.

Hopefully it is still understandable.