longer execution time in PGCC 6.0.5 than PGCC 5.2.4

shingo · July 24, 2005, 8:59am

Hi,

I compiled a benchmark program by pgcc 6.0.5 and pgcc 5.2.4
[/url]http://w3cic.riken.go.jp/HPC/HimenoBMT/Load_module/cc_himenoBMTxp_l.lzh
on a 2 AMD opteron 250 CUP machine and run it.

% pgcc -fastsse -Mconcur -DLARGE himenombmtxps.c

The measure by the benchmark program shows that it runs at 1364 Mflops for pgcc 6.0.5
while at 1653 Mfops for pgcc 5.2.4, about 20% faster. If only use single CPU

% pgcc -fastsse -DLARGE himenombmtxps.c

both run at about 1160Mflops with several percent difference.

I tested several compiler options described in Users’ Guide but I could not
run the benchmark test compiled by pgcc 6.0.5 as fast as by 5.2.4.

Do you know why this benchmark program compiled -Mconcur by pgcc 6.0.6
is signigicanly slowe than compiled by pgcc 5.2.4?

MatColgrove · July 25, 2005, 7:29pm

Hi Shingo,

Thank you for the report. I was able to recreate the issue here and was able to isolate the problem. With the 6.0 compilers we added an optimization which better recognizes idioms. Although this optimization helps most codes, in your case it causes the loop at line 223 to no longer parallelize since it now contains a call to the “memcopy” idiom. (The compiler wont parallelize loops with funcion calls).

As part of our current work on auto-parallelization, we have addressed this problem and will have a fix in the 6.1 release. For now however, you can add the xflag “-Mx,8,0x8000000” to the compilation to remove the idiom. With the xflag, I show the MFlops increases from 1413 to 2235. Xflags can change from release to release so you should only use this work around with the 6.0 compilers and this particular benchmark.

FYI, to determine which loops are and are not parallelized, add the flags “-Minfo -Mneginfo=concur” when using “-Mconcur”.

Thanks,
Mat

shingo · July 27, 2005, 1:01am

Thank you for the quiick fix.
Another observation for the current PGCC 6.0.5 with -Mconcur option is that
without your instruction -Mx,8,0x8000000,

% pgcc -Mconcur -DLARGE

runs faster by 10 % than

% pgcc -fastsse -Mconcur -DLARGE

for the same benchmark program. The -fastsse option does not always help, but seems sometimes slow down the execution.

Shingo

MatColgrove · July 27, 2005, 6:28pm

Hi Shingo,

It appears that cache alignment (-Mcache_align) is causing the problem. Try compiling with “-fast -Mvect=sse” which is -fastsse without -Mcache_align.

“-fastsse” is an aggregate flag composed of the optimizations that help most codes. In some cases however, certain optimization can hurt performance. If you notice such a case, try breaking up an aggregate flag into its components to determine which optimizations help and which hurt. To get the component list use “-help” flag along with the flag. Note that specific component flags can change.

Example:

pgcc -help -fastsse
Reading rcfile /usr/pgi/linux86-64/6.0/bin/.pgccrc
-fastsse            == -fast -Mvect=sse -Mscalarsse -Mcache_align -Mflushz
-fast               Common optimizations: -O2 -Munroll=c:1 -Mnoframe -Mlre
-M[no]vect[=[no]altcode|[no]assoc|cachesize:<c>|[no]idiom|levels:<n>|nosizelimit|prefetch|[no]recog|smallvect:<n>|[no]sse|[no]transform]
                    Control automatic vector pipelining
    [no]assoc       Allow [disallow] reassociation
    cachesize:<c>   Optimize for cache size c
    [no]idiom       Enable [disable] idiom recognition
    prefetch        Generate prefetch instructions
    [no]sse         Generate [don't generate] SSE instructions
-M[no]scalarsse     Generate scalar sse code with xmm registers; implies -Mflushz
-Mcache_align       Align long objects on cache-line boundaries
-M[no]flushz        Set SSE to flush-to-zero mode

Mat

Topic		Replies	Views
pgf77 performance issue 6.0 vs 5.2 Legacy PGI Compilers (archived)	5	11823	December 12, 2005
Flags for AMD64 Legacy PGI Compilers (archived)	3	7145	October 24, 2005
Performance question: PGI vs GCC vs Intel C++ Legacy PGI Compilers (archived)	6	16441	March 31, 2017
-Mconcur & -Mvect=sse gives NaNs Legacy PGI Compilers (archived)	2	8335	November 2, 2005
Need hints to get good memory bandwidth with pgcc-10 Legacy PGI Compilers (archived)	2	14698	December 3, 2009
-Mconcur still doesn't work correctly Legacy PGI Compilers (archived)	1	5219	July 25, 2005
Remarks / Diagnostic Output? Legacy PGI Compilers (archived)	3	20841	August 8, 2007
-Mconcur problems in pgf90 Legacy PGI Compilers (archived)	2	13559	August 20, 2004
C Compiler hangs on file with certain Cflags defined Legacy PGI Compilers (archived)	7	5111	August 2, 2010
Loop unrolling (PGI 5.1 and 5.2: pgf77) Legacy PGI Compilers (archived)	11	20133	May 31, 2005

longer execution time in PGCC 6.0.5 than PGCC 5.2.4

Related topics