PGF95 won't vectorize loops -- "may not be beneficial&q

I am porting a large CFD code from a Cray X1-E to a Linux cluster that features the PGF95 compiler (version 13.9-0). The code runs very fast on the Cray with all of the major loops readily vectorizing and using OpenMP.

On the Linux cluster, the PGF95 compiler consistently refuses to vectorize any of these major loops. There are no data dependencies, no recursions, nothing (that I can see) to prevent vectorization of the loops. Indeed they vectorize readily on the Cray systems. I have tried every combination of command-line options and directives that I know, but still no-go.

The only clue I get from the compiler is the rather obtuse message:
“Loop not vectorized: may not be beneficial”

That is not very helpful, since I know the loops will vectorize and I know that vectorization is key to the speed of the code. Ironically it does vectorize a number of small, innocuous loops that have little do to with the overall performance of the code.

Question: Is there someway to override the compiler either by directive or command-line option to tell the compiler “Vectorize these loops regardless!”

Here is a typical snippet of code for which I get “may not be beneficial” message:

c$omp do private(gamm1,i,n,rkdtv)
      do  i=1,ncell
        gamm1   = gam(i)-one
        rkdtv   = -rk*dtl(i)/vol(i)
cpgi$   unroll = c:5
        do  n=1,5
          dw(n,i) =  rkdtv  * dw(n,i)
          w(n,i)  = wo(n,i) + dw(n,i)
        end do
        w(6,i)  = gamm1*(w(5,i)-half*(sqr(w(2,i))+sqr(w(3,i))
     1                               +sqr(w(4,i))) /  w(1,i))
      end do

ncell > 1000000

Same result whether using OpenMP or not.

Any suggestions/recommendations would be most welcome. Thanks!

Before I can determine whether or not it would be beneficial to vectorize on a x86 Linux cluster, I need to know the types of the following variables:

gam
dt
vol
dw
w
wo
sqr

The most likely candidate that is causing this issue is the following:

sqr(w(2,i))+…

This is an indirect memory reference which requires consecutive elements to be gathered into a vector register. I will be able to get a better idea though after I know the datatypes that are being used in this code snipet.

All of the variables are REAL*8

sqr is a real*8 statement function define as indicated below:

      implicit real*8 (a-h), integer (i-n), real*8 (o-z)
c
c--local variables
c
      dimension dtl(ncell),gam(ncell),vol(-21:ncell)
      dimension dw(neq,ncell),w(neq+1,ncell),wo(neq+1,ncell)
c
c--square (sqr) statement function definition
c
      sqr(x) = (x)*(x)

I tried commenting out the line of code with the sqr statement function reference. Still no-go --> “Loop not vectorized, may not be beneficial”



[/code]

After further investigation, have determined it is a deficiency in the vectorizer. For the code snipet and declarations that you show, a workaround that would allow vectorization would be to change the declaration of the dw, w, and wo arrays to:

dimension dw(5,ncell),w(6,ncell),wo(6,ncell)

I will file a request for enhancement to get this issue addressed.