Compiling on AMD Opteron: Loop not vectorized: not countable

rhavlin · August 31, 2004, 3:46pm

I am attempting to compile gaussian03 and pgf77 (5.1-3 linux86-64) gives me the message:
“Loop not vectorized: not countable”

while when I compile the same source for the 32-bit arch it seems to vectorize fine. See details below:

" 9, Unrolling inner loop 4 times
Generated prefetch instructions for 2 loads and stores
Timing stats:
vectorize 16 millisecs 100%
Total time 16 millisecs"

vs. (with linux86-64)

" 9, Loop not vectorized: not countable
Timing stats:
Total time 0 millisecs"

Perhaps there is no real problem here, but it looks like there might be.

Thanks,
Bob

MatColgrove · September 1, 2004, 10:13pm

Sorry for getting back to you so late, but I’ve been pondering this. Unfortunately, nothing solid comes to mind.

Some guesses might be:

Your using slightly different options for 64-bits which gives a different behavior.

The 64-bit source has been ported from 32-bits, hence is slightly different. Are the Define (-D) flags the same?

Try compiling the file on a 64-bit system using the standard flags plus “-tp k8-64”. Recompile again using “-tp k8-32”. Do you still see a difference? The “-tp” option tells the compiler the target architecture. k8-64 generates 64-bit code for Opteron while k8-32 generates 32-bit code.

Mat

rhavlin · September 2, 2004, 2:28am

Thanks for the suggestions Mat! Below I provide more information as per your suggestions. I hope it helps!

1) The reason I even noticed this was comparing two different compilations at k7 and k8-64:

k7:
pgf77 -mp -O2 -tp k7 -Mreentrant -Mrecursive -Mnosave -Minfo -Mneginfo -time -fast -Munroll -Mvect=assoc,recog,cachesize:262144,prefetch -c aabs.f
aabs:
9, Unrolling inner loop 4 times
Generated prefetch instructions for 2 loads and stores
Timing stats:
Total time 0 millisecs

opteron (k8-64):
pgf77 -i8 ‘-mcmodel=medium’ -mp -O2 -tp k8-64 -Mreentrant -Mrecursive -Mnosave -Minfo -Mneginfo -time -fast -Munroll -Mvect=assoc,recog,cachesize:1048576 -c aabs.f
aabs:
9, Loop not vectorized: not countable
Timing stats:
init 16 millisecs 100%
Total time 16 millisecs

2) As you suggested, I tried the same compile line while just changing -tp from “k8-64” to “k8-32” and it no longer gives the “not countable” error and appears to unroll the loops.

Hmm… Not sure where to go from here??

MatColgrove · September 2, 2004, 5:58pm

I see two possible reasons. One might be because of the code generator being used and the second is because of “-i8”.

We actually use two separate 32-bit code generators, one for older x87 systems and a second for SSE2 enabled systems. k8-64 systems only use the SSE code generator. In your example, the k7 system is using the old CG and the k8-64 is using the new CG. To test this theory, you’d need to compile with and without “-Mscalarsse” on a k8 or p4 system. “-Mscalarsse” tells the compiler to use the new CG. To determine which is actually being used, compile with “-v” and see which directory pgftn is being pulled from. “…/linux86/5.1/bin/p3/pgftn” is the old and “linux86/5.1/bin/newcg/pgftn” is the new.

The second possiblity is because of “-i8”. With the 5.1 and 5.0 compilers we were missing some optimiziation opportunities when “-i8” was present. This might be one of them. We greatly enhanced our “-i8” optimizations with 5.2, so you might want to try the newer release. You can also try, as an experiment, compiling without “-i8”. Of course, leave “-i8” for your actual build since you might need it for C and Fortran interoperability.

Mat

Topic		Replies	Views
Loop unrolling (PGI 5.1 and 5.2: pgf77) Legacy PGI Compilers	11	20111	May 31, 2005
PGF95 won't vectorize loops -- "may not be beneficial&q Legacy PGI Compilers	3	4739	October 31, 2013
compiler option for Opteron revision F Legacy PGI Compilers	3	5638	December 28, 2007
Remarks / Diagnostic Output? Legacy PGI Compilers	3	20829	August 8, 2007
PGI not vectorizing openmp loops Legacy PGI Compilers	1	2487	October 23, 2012
fail to converged when binary compiled by latest release Legacy PGI Compilers	5	23245	October 25, 2004
Force a loop to vectorize Legacy PGI Compilers	6	4500	July 26, 2022
Is there a way to vectorize this routine? Legacy PGI Compilers	6	48340	October 9, 2007
New facet Legacy PGI Compilers	1	2010	October 4, 2012
Vectorizing a loop with a reduction on a conditional Legacy PGI Compilers	8	6854	January 24, 2023

Compiling on AMD Opteron: Loop not vectorized: not countable

Related topics