pgf77 performance issue 6.0 vs 5.2

I have some legacy f77 code that I’ve been testing the 5.2 versus 6.0 compiler. The system I am running on is a Intel Pentium 4 running Suse 9.3, Linux 2.6.11. Doing a

pgf77 -V

I get 5.2-4 and 6.0-8 respectively. The CPU times I get for two different programs on 5.2 are;
238.951 and 221.501

For the 6.0 run for the same programs I get;
268.297 and 245.096

For both cases the compiler options concerning optimization are;
-fast -fastsse -Miniline

I have tested other programs and have gotten consistent results with 6.0 producing slower code than 5.2. This also appears to be true on our 64-bit Opteron systems as well. However, I would be very happy to move to 6.0 if I can resolve this problem. That level corrects execution-time problems in other programs that are compiled with 5.2 on our Linux 2.4 systems.

Hi akushner,

Can you post the 5.2 and 6.0 runtimes for the following flagsets:

  1. -fastsse
  2. -fastsse -Mipa=fast,inline

I want to see if the regression is caused by inlining or by some other optimization. Also, I want to see what happens if you use IPA inlining instead.

I suspect that a routine that was being inlined is not longer. To view what subroutines are being inlined add “-Minfo=inline” to the compilation line and compare the output between the 5.2 and 6.0.

Note that “-fast” is part of “-fastsse” so is not needed.

  • Mat

Matt,

Thanks for the reply. I’ll run the tests and post the info when I get back to the office on Monday.

We have never used -Mipa because we get the message from the link phase (I can’t recall the exact message) that it was turned off because of not having a main or something. The entry to the programs is through a C front end, so I thought that caused it to be turned off (we have to use -Mnomain). If we could get Mipa to work that would be great.

Also, thanks for the note about -fast and -fastsse. I thought I saw there were some flags turned on by -fast that were not turned on -fastsse, but I may have misread the manual.

IPA’s most likely complaining that it’s missing some IPA information. If you compile the C portion of the code with IPA as well, the message should go away. Also, you can try “-Mipa=fast,inline,safe”. “safe” tells pgipa that you think it’s safe to go ahead with the IPA recompilation even if your missing some information.

  • Mat

Unfortunately we do not have PGI’s C compiler licensed. So, the pertinent output for both the 5.2 and 6.0 compiles with the Mipa and Minfo flags as you suggested looked like;

1, extracting subprogram for IPA, size 35
1, extracting subprogram for IPA, size 28
1, extracting subprogram for IPA, size 52
1, extracting subprogram for IPA, size 22
IPA inhibited: no main routine

So, I don’t think Mipa is a factor. The runtime table for the 4 runs is;
5.2 -fastsse -Mipa=fast,inline,safe; 239.535u 1.054s 4:12.18 95.4% 0+0k 0+0io 3pf+0w
5.2 -fastsse; 240.561u 1.128s 4:16.59 94.1% 0+0k 0+0io 3pf+0w
6.0 -fastsse -Mipa=fast,inline,safe; 270.837u 0.981s 4:45.92 95.0% 0+0k 0+0io 2pf+0w
6.0 -fastsse ; 270.509u 0.620s 4:46.15 94.7% 0+0k 0+0io 0pf+0w

While several other applications I’ve tested have shown that 5.2 object code is faster than the equivalent 6.0 code, I did test a different application this morning that has the 6.0 code being 10% faster than the 5.2 code.


Andy

Hi Andy,

It’s not inlining, so the next step is to start breaking out the individual components of “-fastsse” to determine which optimization is causing the slow-down. The most likely culprits are “-Mlre”, “-Msmart”, “-Mvect=sse”.

Try running with:

“-fastsse -Mnosmart”, “-fastsse -Mnolre”, “-fastsse -Mnovect”.

If those aren’t it, start at “-O2” and then progressively add in the following optimizations: “-O2 -Munroll=c:1 -Mnoframe -Mlre -Msmart -Mvect=sse -Mscalarsse -Mcache_align -Mflushz”.

Also, can you send us your code (to trs@pgroup.com) or is it available on the web? We should release 6.1 this week and I’d like to see if the regression still occurs. If it does, then I’ll file a technical problem report (TPR) to have the regression fixed.

Thanks,
Mat