Because earlier versions of PGI Fortran (<=13.9) ignore the “collapse” clause in “loop” directive, I use manually collapsed loops to emulate the “collapse” clause, which generally give much better performance than uncollapsed loops.
Recently I started using PGI v15.5, but found that my codes with manually collapsed loops fail. Included in the link below is a sample code and test data:
For convenience I also include the loop section here:
is=iend-istr+1; js=jend-jstr+1 !$acc loop independent do ijknn=0,(kend-kstr+1)*js*is*6-1 ijk=ijknn/6; m=ijknn-ijk*6+1 k=ijk/js/is; j=ijk/is-k*js; i=ijk-k*js*is-j*is+istr; j=j+jstr; k=k+kstr b(m,i,j,k) = ev(m,1,i-1,j,k) - ev(m,1,i,j,k) & + ev(m,2,i,j-1,k) - ev(m,2,i,j,k) & + ev(m,3,i,j,k-1) - ev(m,3,i,j,k) end do
ev is the input data read from fort.80. b is the output.
To compile and run on GPU, which gives output fort.81:
pgf90 -Mpreprocess -acc ev_b.f90 && ./a.out
To compile and run on CPU, which gives output fort.82:
pgf90 -Mpreprocess ev_b.f90 && ./a.out
They should have the same output. However, the GPU version always produce zeros. There is a commented section using standard loops, which always work well on both CPU and GPU.
As far as I know, manually collapsed loops are a very common way to write codes. There is no non-standard Fortran syntax, either. In older PGI versions manually collapsed loops worked perfectly. Does anyone know why they fail now?