I am porting a large CFD code from a Cray X1-E to a Linux cluster that features the PGF95 compiler (version 13.9-0). The code runs very fast on the Cray with all of the major loops readily vectorizing and using OpenMP.
On the Linux cluster, the PGF95 compiler consistently refuses to vectorize any of these major loops. There are no data dependencies, no recursions, nothing (that I can see) to prevent vectorization of the loops. Indeed they vectorize readily on the Cray systems. I have tried every combination of command-line options and directives that I know, but still no-go.
The only clue I get from the compiler is the rather obtuse message:
“Loop not vectorized: may not be beneficial”
That is not very helpful, since I know the loops will vectorize and I know that vectorization is key to the speed of the code. Ironically it does vectorize a number of small, innocuous loops that have little do to with the overall performance of the code.
Question: Is there someway to override the compiler either by directive or command-line option to tell the compiler “Vectorize these loops regardless!”
Here is a typical snippet of code for which I get “may not be beneficial” message:
c$omp do private(gamm1,i,n,rkdtv) do i=1,ncell gamm1 = gam(i)-one rkdtv = -rk*dtl(i)/vol(i) cpgi$ unroll = c:5 do n=1,5 dw(n,i) = rkdtv * dw(n,i) w(n,i) = wo(n,i) + dw(n,i) end do w(6,i) = gamm1*(w(5,i)-half*(sqr(w(2,i))+sqr(w(3,i)) 1 +sqr(w(4,i))) / w(1,i)) end do
ncell > 1000000
Same result whether using OpenMP or not.
Any suggestions/recommendations would be most welcome. Thanks!