gprof vs pgprof

Hello All,

I’ve got a bit confused on a very trivial thing, probably somewhere about a kindergarten level. Maybe somebody would be so kind to help me out?

I’m trying to profile some code with pgprof first, and then with gprof, and the results seem to be quite different. Specifically, if I compile it with
-Mprof=func switch, the pgprof profile comes out to be

Function
Calls Time(%) Cost(%) Name:

66022803 38.12 68 greenf (gre_sep5.f:112)
276447237 24.97 25 ex (gre_sep5.f:326)
198068409 11.45 11 lint (gre_sep5.f:343)
15251 10.13 20 gre_full (gre_sep5.f:198)
39189414 3.40 3 ene (gre_sep5.f:316)
777969 2.64 31 measure_gim (sep15.f:1202)
777969 2.37 30 diag_dens (sep15.f:1030)
1309999 2.00 2 check (sep15.f:1739)

Then I compile the same code with -pg switch and run gprof:

Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls s/call s/call name
17.34 25.98 25.98 _mth_i_dexp
14.18 47.23 21.25 136384347 0.00 0.00 greenfuncz_greenf

14.05 68.29 21.07 f90io_close
11.78 85.96 17.66 __rouexit
10.80 102.14 16.18 mcount2
4.88 109.46 7.32 2709999 0.00 0.00 main_check
4.84 116.70 7.25 15251 0.00 0.00 greenfuncz_gre_full

4.38 123.27 6.57 _mth_dsincos
2.48 126.99 3.72 487531869 0.00 0.00 greenfuncz_ex

First of all, what are these (service?) functions mcount2, f90io_close and __rouexit? And do they take time (as gprof shows) or don’t they (as pgprof shows)?

Also, does the function lint() take 10% of time (pgprof) or doesn’t it (gprof?)

Thanks a lot in advance!

Zhenya

Hi Zhenya,

To answer your question, let’s first discuss the differences between -Mprof=func and -pg.

-Mprof=func generates an instrumented profile. We instrument the code. Essentially we wrap timers around each call site in your program. Compiling with -pg or -Mprof=time generates a sample based profile. Timers are not wrapped around each call site. Instead, a periodic snap shot of the running program’s state is recorded. Sample based profiling, therefore, is less intrusive. Moreover, the time to profile your program is less with sample based than instrumentation. Also for programs that do not run for a long time, sample based profiling usually provides more accurate times since there’s less profiling overhead. Instrumentation has the advantage of recording time for every call-site; allowing you to see the true call path of your program.

Yes, the _mcount2 function takes time. They comes from the gprof profiling mechanism. It’s not a part of the -Mprof=func mechanism (so they are not a part of your -Mprof=func profile). Also -Mprof=func only records times in your user program. -pg/-Mprof=time may also catch some of the pgf90 runtime routines (e.g., f90io_close, etc.)

As for function lint(), it’s possible that the 10% you’re seeing in -Mprof=func is coming from use of some of the intrinsics/runtime routines recorded in the -pg profile. That’s because -Mprof=func only records times by your user program. It does not record times spent by the runtime.

I hope this helps. Obviously no two profiling mechanisms are alike, but hopefully both will steer you in the right direction when finding performance bottle necks in your code. If in doubt, it’s always good to try both as you did. When given a choice, I probably would use -Mprof=time (or -pg).

-Mark

Hello Mark,

Thank you very very much for such a detailed explanation! The whole thing makes sence now.
Only one tiny thing’s left: is f90io_close routine related to printing to stdout (i.e. via print*,‘blah’) or only to the disk i/o?

A zillion thanks,

Zhenya

Hi Zhenya,

f90io_close is called usually to perform a Fortran CLOSE statement.

-Mark