Pgprof application


Last years, I have always installed PGI “Community” version, because of I managed some computers in two university labs. Two months ago, I installed lastest PGI “Community” version that I found at Nvidia site ( redirected me to However, this new version (hpc-2020) doesn’t include “pgprof application” and I need it (older “Community” versions like 2019-19.4 and 2019.19.10 included “pgprof”.

What can I do now?




PGI was re-branded as the NVIDIA HPC Compiler and is now included as part of the NVIDIA HPC SDK ( While the PGI Community Edition is no longer available, the NVHPC SDK is available at no cost for all releases, not just two releases a year. The older PGI drivers (pgcc, pgc++, pgfortran) are available with SDK but you should consider moving to the new compiler drivers (nvc, nvc++, nvfortran).

Pgprof was a repackaged version of nvprof. Nvprof is available in the SDK however NVIDIA deprecated this profiler about a year ago so please consider transitioning to the new NSight-Systems and NSight-Compute profilers. See:

Hope this helps,

It looks like nsys is not able to profile CPU-only application like pgprof was.
What is the solution ? (and command lines) ?


You actually can get the same function level profiling of the CPU application as pgprof with nsys. By default nsys uses the “Last Branch Report” for CPU profiling (-b lbr) which doesn’t give great detail.

Instead, I’d recommend compiling your code with “-g” or “-gopt” to include Dwarf information. (-g can inhibit some optimization to make it easier to debug, so in this case I use -gopt, which includes Dwarf without reducing optimization).

Then when collecting the profile add “-b dwarf” to nsys. The CPU report is not shown in the command line “stats”, so you then need to open the profile in the GUI. From there, below the timeline there’s a “Events View” box. Select the drop-down menu to one of the three views: Top-down, Bottom-up, or flat. This gives you the same function level profile as you’d see in pgprof.

If needed, you can adjust the “–sampling-period” from the default 1000000. The accepted values range between 4000000 and 125000. Though the smaller the sample size, the bigger the profile and more profiling overhead.


Dear Mat, Thanks for your replay, it’s very Clear.

This pars is quite blocking when working in a HPC environnement … it could be great to make this evolve. We do not only profile a GPU kernel, but a complete portion of software.

Best regards