Hi,
Is there a way to get diagnostic output out of the compiler that reports on things like vectorizing loops? Also can you please point me at the documentation about what -O4 does and how the -O? compares to GCC, ICC and other compilers?
Is there a compatibility mode / wrapper that enables pgcc to understand more of the common switches that GCC and ICC understand?
Finally, my home brewed number crunching app doesn’t seem to be running very fast with PGCC. It has a number of tight loops with floating point operations that vectorize cleanly under ICC (and to some extent the latest versions of GCC).
The results I’m getting are:
GCC 3.2.2 -march=pentium3 -mcpu=pentium3 -mmmx -msse -mfpmath=sse -malign-double -fpic -O3 -fno-strict-aliasing -ffast-math -foptimize-register-move -frerun-loop-opt -fexpensive-optimizations -fprefetch-loop-arrays -fomit-frame-pointer -funroll-loops -Wall
3m:17s
PGCC 7.0.7 -tp=piii -Mvect=sse -fpic -O4 -Mfprelaxed -Msingle -Mfcon -Mcache_align -Mflushz -Munroll=c:1 -Mnoframe -Mlre -Mipa=align,arg,const,f90ptr,shape,libc,globals,localarg,ptr,pure
2m:57s
ICC 9.1.051 -march=pentium3 -mcpu=pentium3 -mtune=pentium3 -msse -xK -cxxlib-icc -fpic -O3 -ansi-alias -fp-model fast=2 -rcd -align -Zp16 -ipo -fomit-frame-pointer -funroll-loops -w1 -vec-report3
0m:46s
The machine in question is a Pentium 3, as the optimization flags indicate.
That makes ICC over 4x faster. 40% would be a huge difference. 400% makes me think there might be a problem with some of the compiler switches I am using (I know it’s still faster than GCC, but GCC in question is 1) 5 years out of date and 2) GCC is known to be quite bad at producing fast code). Is there a problem with any of the compiler switches I listed above for PGCC? Is there any other option that’s worth trying?
Finally, I am finding that -Mscalarsse makes the numbers that fall out of my program wildly out. The differences are as big as the 3rd significant figure, and when multiplying things out this leads to massive errors. The problem is reminiscent of a similar issue with GCC (although the numbers are not as far out on GCC) when -mfpmath=sse,387 is used. Is -Mscalarsse known to cause problems?
Many thanks.