-Mconcur problems in pgf90

I am getting incorrect results from a long Fortran 90 program, so I boiled it down to the basic form shown below.

I compile using “pgf90 -o basic -fastsse -Mconcur -Minfo basic.f90”.
-Minfo says (for the “do ix …” line) “parallel code for non-innermost
loop generated; block distribution”

When run, it prints the wrong answers for a() - a mix of 3.0, 4.0, 6.0
(should all be 6.0).

Deleting the -fastsee makes no difference.

It does give the correct answer, 6.0 for all a(), if compiled without the
-Mconcur switch.

Also, it gives the correct answer, with -Mconcur, if the commented “if”
statement is uncommented, in which case info reports no parallel code

Also, it gives the correct answer, with -Mconcur, if I compile with
-Mbounds (no bounds violations reported), in which case info reports no
parallel code generated.

My machine is a dual processor AMD Opteron, 64-bit Redhat Linux
system. PGI work station compiler version 5.2.

The environment variable NCPUS is 2.

No doubt I am doing something wrong, but I can’t see what. It looks to
me like the parallel code generated using -Mconcur is not working

program basic

implicit none

integer :: ix
integer :: iy

real :: sum
integer :: ns

real, dimension(10, 10) :: a, b

a = 6.0

do ix = 1, 10
do iy = 1, 10

ns = 1
sum = a(ix,iy)

if(ix-1 >= 1) then
sum = sum + a(ix-1,iy)
ns = ns + 1
end if

if(ix+1 <= 10) then
sum = sum + a(ix+1,iy)
ns = ns + 1
end if

if(iy-1 >= 1) then
sum = sum + a(ix,iy-1)
ns = ns + 1
end if

if(iy+1 <= 10) then
sum = sum + a(ix,iy+1)
ns = ns + 1
end if

! if(ix == 5 .and. iy == 5) write(6,*) sum, ns

b(ix,iy) = sum/float(ns)

end do
end do

a = b

write(6,*) a

end program basic

Hi Russell,

Thanks for a very good write-up! It appears to me like this is a bug with “-Mconcur”. The code fails at 1 thread so its most likely a problem with the generated assembly code. It works if only one of the if statements is used. So its somehow getting confused on the second if statement. I’ll go ahead and submit a TPR, and we’ll see if we can determine exactly what’s going on.

As a work around, we can use OpenMP to parallelize the loop. Before the ix do loop add:

!$OMP DO PRIVATE (sum, ns)

and add the following after the second end do


Compile the code using “-mp” and the set “NCPUS”.

I’ll try and keep you posted with any progress, but for now hopefully using OpenMP will get you on track.

  • Mat

FYI, I added TPR3332.