Hi all,
I have a double-nested loop in Fortran that computes some results from values stored in an array and stores the minimum values.
I was wondering if there was any way of using GPU parallelisation to speed this up? PGPROF reports a compute intensity of 4.17 and it is a major chunk of run-time for my program.
I tried splitting the loop to store the results in temporary arrays (then looking through these to find the minimum value), shifting the IF statements to outside the main loop in order to remove the scalar dependency, but this resulted in privisation of these arrays prevent parallelisation.
Is there a better way to go about this, or is it a situation not geared towards parallelisation due to the need to store all the results.
Chris
DO 200 KWALL = KS,KE,1
KM1 = KWALL-1
IF(KM1.LT.1) KM1 = 1
KP1 = KWALL
IF(KP1.GT.KMM1) KP1 = KMM1
DO 200 JWALL = JS,JE,1
JM1 = JWALL-1
IF(JM1.LT.1) JM1 = 1
JP1 = JWALL
IF(JP1.GT.JMM1) JP1 = JMM1
!
! FIRST THE I = 1 WALL
!
FSOLID = 1.0 -0.25*(MWALLI1(JM1,KM1,NBLCK)+MWALLI1(JP1,KP1,NBLCK) &
+ MWALLI1(JM1,KP1,NBLCK)+MWALLI1(JP1,KM1,NBLCK))
FSOLID = FSOLID*I1_SHEAR(NBLCK)
XD = X(1,JWALL,KWALL,NBLCK) - XP
RD = R(1,JWALL,KWALL,NBLCK) - RP
RTD = RT(1,JWALL,KWALL,NBLCK) - RTP
DISTSQ = XD*XD + RD*RD + RTD*RTD
DISTSQ = FSOLID*DISTSQ + (1.-FSOLID)*DLMINSQ
IF(DISTSQ.LT.DMINSQ) THEN
DMINSQ = DISTSQ
IMIN = 1
JMIN = JWALL
KMIN = KWALL
XDMIN = XD
RDMIN = RD
RTDMIN= RTD
IF_FOUND = 1
ENDIF
!
! NEXT THE I = IM WALL.
!
FSOLID = 1.0 -0.25*(MWALLIM(JM1,KM1,NBLCK)+MWALLIM(JP1,KP1,NBLCK) &
+ MWALLIM(JM1,KP1,NBLCK)+MWALLIM(JP1,KM1,NBLCK))
FSOLID = FSOLID*IM_SHEAR(NBLCK)
XD = X(IM,JWALL,KWALL,NBLCK) - XP
RD = R(IM,JWALL,KWALL,NBLCK) - RP
RTD = RT(IM,JWALL,KWALL,NBLCK) - RTP
DISTSQ = XD*XD + RD*RD + RTD*RTD
DISTSQ = FSOLID*DISTSQ + (1.-FSOLID)*DLMINSQ
IF(DISTSQ.LT.DMINSQ) THEN
DMINSQ = DISTSQ
IMIN = IM
JMIN = JWALL
KMIN = KWALL
XDMIN = XD
RDMIN = RD
RTDMIN= RTD
IF_FOUND = 1
ENDIF
!
200 CONTINUE