Compiling on AMD Opteron: Loop not vectorized: not countable

I am attempting to compile gaussian03 and pgf77 (5.1-3 linux86-64) gives me the message:
“Loop not vectorized: not countable”

while when I compile the same source for the 32-bit arch it seems to vectorize fine. See details below:

" 9, Unrolling inner loop 4 times
Generated prefetch instructions for 2 loads and stores
Timing stats:
vectorize 16 millisecs 100%
Total time 16 millisecs"

vs. (with linux86-64)

" 9, Loop not vectorized: not countable
Timing stats:
Total time 0 millisecs"

Perhaps there is no real problem here, but it looks like there might be.


Sorry for getting back to you so late, but I’ve been pondering this. Unfortunately, nothing solid comes to mind.

Some guesses might be:

Your using slightly different options for 64-bits which gives a different behavior.

The 64-bit source has been ported from 32-bits, hence is slightly different. Are the Define (-D) flags the same?

Try compiling the file on a 64-bit system using the standard flags plus “-tp k8-64”. Recompile again using “-tp k8-32”. Do you still see a difference? The “-tp” option tells the compiler the target architecture. k8-64 generates 64-bit code for Opteron while k8-32 generates 32-bit code.

  • Mat

Thanks for the suggestions Mat! Below I provide more information as per your suggestions. I hope it helps!

1) The reason I even noticed this was comparing two different compilations at k7 and k8-64:

pgf77 -mp -O2 -tp k7 -Mreentrant -Mrecursive -Mnosave -Minfo -Mneginfo -time -fast -Munroll -Mvect=assoc,recog,cachesize:262144,prefetch -c aabs.f
9, Unrolling inner loop 4 times
Generated prefetch instructions for 2 loads and stores
Timing stats:
Total time 0 millisecs

opteron (k8-64):
pgf77 -i8 ‘-mcmodel=medium’ -mp -O2 -tp k8-64 -Mreentrant -Mrecursive -Mnosave -Minfo -Mneginfo -time -fast -Munroll -Mvect=assoc,recog,cachesize:1048576 -c aabs.f
9, Loop not vectorized: not countable
Timing stats:
init 16 millisecs 100%
Total time 16 millisecs

2) As you suggested, I tried the same compile line while just changing -tp from “k8-64” to “k8-32” and it no longer gives the “not countable” error and appears to unroll the loops.

Hmm… Not sure where to go from here??

I see two possible reasons. One might be because of the code generator being used and the second is because of “-i8”.

We actually use two separate 32-bit code generators, one for older x87 systems and a second for SSE2 enabled systems. k8-64 systems only use the SSE code generator. In your example, the k7 system is using the old CG and the k8-64 is using the new CG. To test this theory, you’d need to compile with and without “-Mscalarsse” on a k8 or p4 system. “-Mscalarsse” tells the compiler to use the new CG. To determine which is actually being used, compile with “-v” and see which directory pgftn is being pulled from. “…/linux86/5.1/bin/p3/pgftn” is the old and “linux86/5.1/bin/newcg/pgftn” is the new.

The second possiblity is because of “-i8”. With the 5.1 and 5.0 compilers we were missing some optimiziation opportunities when “-i8” was present. This might be one of them. We greatly enhanced our “-i8” optimizations with 5.2, so you might want to try the newer release. You can also try, as an experiment, compiling without “-i8”. Of course, leave “-i8” for your actual build since you might need it for C and Fortran interoperability.

  • Mat