pgcollect does not work with a shell script

When I try to run pgcollect using a shell script as an argument I get an error as shown below.

pgcollect -exe …/…/Linux_2.6_64/bin/migkdep3 temp.sh
pgcollect-Fatal-/usr/local/pgi/linux86-64/12.9/bin/pgsampt TERMINATED by signal 11
Arguments to /usr/local/pgi/linux86-64/12.9/bin/pgsampt
/usr/local/pgi/linux86-64/12.9/bin/pgsampt temp.sh

However if I run

pgcollect …/…/Linux_2.6_64/bin/migkdep3 arguments

It works. What am I missing as I have a number of test cases to run.

I am running Open Suse 11.4 64 bit version 12.9 of pgcollect.

Sorry, this is a confusing and inconsistent part of how pgcollect works.

The use of a script as the argument, along with the use of the ‘-exe’ option, is only intended for use with OProfile-style (aka event-based) sampling.

For basic time-based sampling, you would use the approach that worked for you below. The default sampling option, -time, will not work with a script. In this case the pgcollect command would just go in the script, preceding the invocation of the program.

Alternatively, if you have OProfile installed on the system, you could use pgcollect as you were attempting to, only with the ‘-hwtime’ option:

pgcollect -exe ../../Linux_2.6_64/bin/migkdep3 temp.sh

I just had a complaint yesterday about the clarity of the pgcollect help text for which I filed a problem report, and in this case pgcollect shouldn’t crash, so I will file a problem report on that as well. Please let us know if this doesn’t help and we’ll do what we can to get you up and running.

Thank you.

I have tried your suggestion but it seems pgcollect cannot start opcontrol

pgcollect -v -hwtime -exe …/…/Linux_2.6_64/bin/migkpre profile.sh

/usr/local/pgi/linux86-64/12.9/bin/pgoprun -from-pgprof -exe …/…/Linux_2.6_64/bin/migkpre -time 10 profile.sh
ATTENTION: Use of opcontrol is discouraged. Please see the man page for operf.
Error: counter 0 not available

Unable to complete dump of oprofile data: is the oprofile daemon running?
Daemon not running
Using /var/lib/oprofile/samples/ for samples directory.
opreport error: No sample file found: If using opcontrol for profiling,
try running ‘opcontrol --dump’; otherwise, specify a session containing
sample files.

opcontrol -v
opcontrol: oprofile 0.9.8 compiled on Nov 14 2012 17:36:49

Any other suggestions?

This error is the result of using the latest version of OProfile 0.98, which introduces ‘operf’, and also probably the management of the PMU (performance monitoring unit) by newer versions of Linux. There is a workaround, and we are working on a more palatable solution.

The workaround is:

  • make sure that no ‘perf’ commands are running
  • check to see that the file /proc/sys/kernel/nmi_watchdog exists
  • if it does not, we will need to research your problem further
  • if it does, then disable the NMI watchdog timer by running, as root:
opcontrol --deinit
echo 0 > /proc/sys/kernel/nmi_watchdog

Then retry your pgcollect command.

Thanks, but I just checked and I do not have an nmi_watchdog in that directory. I found

/proc/sys/kernel/panic_on_io_nmi
/proc/sys/kernel/panic_on_unrecovered_nmi

and a

watchdog
watchdog_thresh

So any other suggestions would be appreciated.

Apparently on some systems the file is named /proj/sys/kernel/watchdog instead of /proc/sys/kernel/nmi_watchdog.

If I understand correctly, you have /proj/sys/kernel/watchdog. The same solution should work with that file:

opcontrol --deinit
echo 0 > /proc/sys/kernel/watchdog

Please confirm that this works for you. Sorry I wasn’t aware of this other file name.

–Don

Don,

Yes that did work and pgcollect ran successfully.

My only other issue was that I had to put your linux86/lib ahead of linux86-64/libso to get optopgprof to run as it wants the 32 bit version of libnuma.

Thank you for the help.

Richard