I want to use pgprof to measure the difference in performance when my code is compile at “-O1” and when its compiled with “-fastsse”. What should I be aware of in order to get realistic information?
To answer your question, let us first discuss how profiling works.
There are currently two methods for profiling in PGPROF: Profiling through Instrumentation and profiling through a sample based mechanism. In both of these methods, profile information is collected at certain points in the running application.
For instrumentation based profiling (selected with the -Mprof compiler switch), profile information is collected at the end of a basic block for source line profiling or at the end of a called routine for routine level profiling. A basic block is defined as a series of program instructions with only one entry point and one exit point. By definition, a single instruction can be called a basic block. This is the case when you disable optimizations through the -O0 compiler switch. Larger basic blocks, used at optimization levels -O1 and higher, permit more aggressive optimizations. Therefore, if you compare a profile of a program compiled at -O0 with a profile of a program compiled at -O1, you will notice that the program compiled -O0 has profile information for each executed source line. The profile at -O1 will probably have fewer profile lines because the size of the basic block is larger than one statement.
When you use instrumentation profiling with an application compiled -fastsse, you may see even larger basic block sizes as well as some “duplicate” line numbers in the profiler (assuming that you compiled with -Mprof=lines). These duplicate lines generally occur around loops that the compiler unrolled. To help distinguish each “duplicate” line number, PGPROF has a “statement number” field that can be enabled in the “View” menu. Click the “Statement Number” check box to display the statement number in PGPROF’s top right table.
For sample based profiling, information is captured after a fixed timed interval. Because sample based profiling is not collected at the end of a basic block, it is not affected by compiler optimizations. Moreover, sample based profiling is less intrusive to the running application. Therefore, we recommend using PGPROF’s sample based mechanism over the instrumentation mechanism. Because information is captured at timed intervals, it cannot be used to accurately determine path coverage (i.e., which routines/statements were executed). You should still use the instrumentation based mechanism for that. Also there are some other limitations mentioned in the release notes. Other than that, compile your program with -pg to use PGPROF’s sample based profiling mechanism.