pgcollect cannot stat pgaccnum.out

Hello,

I’ve been attempting to get pgcollect running with a few various Cuda Fortran programs, but I run into an error every time.

Using strace when running pgcollect produces the following line of interest:

$strace pgcollect -cuda test
...
stat("pgpaccnum.out", 0x7fffcaefc2e0)   = -1 ENOENT (No such file or directory)
...

It then goes on to give the following error messages:

dwf_init: Unable to stat file
pgcollect-Fatal-/opt/pgi/linux86-64/10.6/bin/pgevtofq TERMINATED by signal 11
Arguments to /opt/pgi/linux86-64/10.6/bin/pgevtofq
/opt/pgi/linux86-64/10.6/bin/pgevtofq test pgprof.out

This file appears to be related to the PGI Accelerator, but it isn’t being used by my program. Any ideas as to what is causing this? Should this file be located somewhere or should it be generated?

We have a known issue in pgcollect 10.6 that you may be encountering.

If you invoke pgcollect with ‘-cuda’ and don’t specify any keywords such as ‘-cuda=gmem’, pgcollect fails to find the CUDA Fortran profiling config file it needs to generate a good profile. This will be fixed in PGI 10.8 in early August.

Try running one of the following:

pgcollect -cuda=gmem
pgcollect -cuda=branch
pgcollect -cuda=cfg:<cfgpath>

where specifies the name of a file containing counter names. You can find the counter names from your GPU documentation, or run

pgcollect -cuda=list

to get a concise list of all the counters. Note that different counters are supported on different GPUs (compute capabilities).

To get the same effect as -cuda with no keywords, you can use

pgcollect -cuda=cfg:/opt/pgi/linux86-64/10.6/etc/pgprof/cudaprof.cfg.none

We will continue to look into the problem you reported with pgpaccnum.out, but please let us know if things start working better when you use this workaround.

Some more information on the problem, as these solutions didn’t seem to help much.

I tried each of solutions that you posted, but they all produced the same results.

I tried:

pgcollect -cuda=gmem
pgcollect -cuda=branch
pgcollect -cuda=cfg:confg
pgcollect -cuda=cfg:/opt/pgi/linux86-64/10.6/etc/pgprof/cudaprof.cfg.none

The end result was the same exact error message:

target process has terminated, writing profile data
dwf_init: Unable to stat file
pgcollect-Fatal-/opt/pgi/linux86-64/10.6/bin/pgevtofq TERMINATED by signal 11
Arguments to /opt/pgi/linux86-64/10.6/bin/pgevtofq
/opt/pgi/linux86-64/10.6/bin/pgevtofq test pgprof.out

For the config file, I only included one simple counter, gridsize. Since I assumed that this was an issue that pgcollect was having with my CUDA Fortran program, I created a basic program to utilize the accelerator. The program compiled and ran fine, but it produced the same error when run under pgcollect.

Is there anything else that may be causing this problem? If you would like more information, such as strace results, let me know. Otherwise, each of these commands produces the same error messages.

We’ll continue looking at this issue, but so far we have not been able to reproduce it. Would you please file a problem report by emailing trs@pgroup.com? We can probably handle this more effectively through that channel.

You don’t need to write the whole thing up again, just refer to user forum issue https://forums.developer.nvidia.com/t/pgcollect-cannot-stat-pgaccnum-out/131868/1 and ask them to assign it to Don.

We’ll have some questions for you about your configuration and so on.

Once we come up with a solution or a workaround I’ll post it here.
thanks
–Don