the following code hangs using more than 1 thread and -O2 (or more) -mp optimation flag.
The same problem occurr in a “real life” fortran 95 code:
Lpeter
pgf90 -V
pgf90 10.3-0 64-bit target on x86-64 Linux -tp gh-64
Copyright 1989-2000, The Portland Group, Inc. All Rights Reserved.
Copyright 2000-2010, STMicroelectronics, Inc. All Rights Reserved.
REAL FUNCTION FN1(I)
INTEGER I
FN1 = I * 2.0
RETURN
END FUNCTION FN1
REAL FUNCTION FN2(A, B)
REAL A, B
FN2 = A + B
RETURN
END FUNCTION FN2
PROGRAM A21
use omp_lib !INCLUDE “omp_lib.h” ! or USE OMP_LIB
INTEGER ISYNC(256)
REAL WORK(256)
REAL RESULT(256)
INTEGER IAM, NEIGHBOR
!$OMP PARALLEL PRIVATE(IAM, NEIGHBOR) SHARED(WORK, ISYNC)
IAM = OMP_GET_THREAD_NUM() + 1
ISYNC(IAM) = 0
!$OMP BARRIER
! Do computation into my portion of work array WORK(IAM) = FN1(IAM)
! Announce that I am done with my work.
! The first flush ensures that my work is made visible before
! synch. The second flush ensures that synch is made visible.
! Wait until neighbor is done. The first flush ensures that
! synch is read from memory, rather than from the temporary
! view of memory. The second flush ensures that work is read
! from memory, and is done so after the while loop exits.
IF (IAM .EQ. 1) THEN
NEIGHBOR = OMP_GET_NUM_THREADS()
ELSE
NEIGHBOR = IAM - 1
ENDIF
DO WHILE (ISYNC(NEIGHBOR) .EQ. 0)
!$OMP FLUSH(ISYNC)
END DO
!$OMP FLUSH(WORK, ISYNC)
RESULT(IAM) = FN2(WORK(NEIGHBOR), WORK(IAM))
write(,) result(iam)
!$OMP END PARALLEL
END PROGRAM A21
This is a known issue (TPR#17688). The problem is that “ISYNC” is not being set a volatile. Hence the compiler performs an optimization where “ISYNC(NEIGHBOR)” is being moved to a register and not updated after each iteration of the do loop. This causes the DO loop to enter an infinite loop.
Unfortunately, there isn’t a good way to fix this. In later versions of the OpenMP standard, the “FLUSH(list)” directive (i…e. FLUSH with a list) has been deprecated. The standard now specifies that " FLUSH(list)" should be evaluated as just a “FLUSH” directive. Hence, in order support this either all variables must be made volatile and severely impact the optimization that can be performed, or replace “FLUSH” with “BARRIER” which will also severely impact performance.
For this code, the best work around would be to simply remove this synchronization code and use a single BARRIER directive before global memory is read.
For example:
% cat ompt.f90
REAL FUNCTION FN1(I)
INTEGER I
FN1 = I * 2.0
RETURN
END FUNCTION FN1
REAL FUNCTION FN2(A, B)
REAL A, B
FN2 = A + B
RETURN
END FUNCTION FN2
PROGRAM A21
use omp_lib !INCLUDE "omp_lib.h" ! or USE OMP_LIB
INTEGER ISYNC(256)
REAL WORK(256)
REAL RESULT(256)
INTEGER IAM, NEIGHBOR, TST
WORK=1.0
!$OMP PARALLEL PRIVATE(IAM, NEIGHBOR, TST) SHARED(WORK, ISYNC)
IAM = OMP_GET_THREAD_NUM() + 1
WORK(IAM) = REAL(IAM)
!$OMP BARRIER
IF (IAM .EQ. 1) THEN
NEIGHBOR = OMP_GET_NUM_THREADS()
ELSE
NEIGHBOR = IAM - 1
ENDIF
RESULT(IAM) = FN2(WORK(NEIGHBOR), WORK(IAM))
write(*,*) IAM, '+', NEIGHBOR, '=', result(iam)
!$OMP END PARALLEL
END PROGRAM A21
% pgf90 -fast -mp ompt.f90 ; a.out
5 + 4 = 9.000000
6 + 5 = 11.00000
4 + 3 = 7.000000
2 + 1 = 3.000000
3 + 2 = 5.000000
8 + 7 = 15.00000
1 + 8 = 9.000000
7 + 6 = 13.00000
thank’s !
your code solve the problem…I assume that portland compiler did the right thing…nevertheless gfortran and some other proprietary compiler (:=) did not experiment the same problem and you should able to run with and without optimization flags…
Regards
an update
it seems that setting the volatile attribute to isync variable fix the problem without changing the code. Is fortran 2003 I mean and many compilers support it