We’re using PGI 10.5 on Ubuntu Linux 10.04. According to the PGI Tools Guide, Release 2010, the proper method for profiling CUDA Fortran programs is to run pgcollect with the “-cuda” option. However, our pgcollect doesn’t seem to have any such option, but instead gives the following error:
pgcollect-Error-Unknown switch: -cuda
There is no mention of a “-cuda” option in the pgcollect man page, and I’m unable to see any detailed accelerator time stats in the “accelerator performance” tab in pgprof. The program in question has been compiled with the “-ta=nvidia” (and has a .cuf suffix, same as doing -Mcuda).
Please advise the proper way to do this, and why the documentation doesn’t reflect reality. :)
The ‘-cuda’ option for pgcollect is new in version 10.6. You will need to update your version.
Elementary! So we upgraded to 10.6. We have all the new options to pgcollect now, however I still don’t get anything other than “seconds” as a measured metric in pgprof no matter what “-cuda=” options I specify for pgcollect. This is a CUDA Fortran program I’m trying to benchmark on an Nvidia Tesla. Program was compiled with “-ta=nvidia”. Please advise.
pgcollect currently only supports CUDA Fortran programs. “-ta” is for the PGI Accelerator model. So I think if you compile without “-ta” you should get the expected NVIDIA GPU hardware counter information in the bottom panel of pgprof.
If this isn’t it,then I’ll ask one of our Tools engineers to step in and help.
I should be more specific. I’m not seeing GPU usage benchmarks in pgprof when doing some testing with !$acc pragmas in Fortran. But let me get back to you on that.
Initial problem was that “pgcollect” looks in the wrong directory for its default config files. It looks here:
…which doesn’t exist. This directory appears to be compiled into pgcollect. It should be looking here:
Creating a “cuda” directory in /opt/pgi/linux86-64/10.6 and symlinking the above directory to “cuda/pgprof” fixes that problem.
Thanks for the feedback.
On the config file location, I will make sure that is fixed in the next release. If you want to be notified directly of changes in status on this issue, email email@example.com and ask to be added to the notification list for TPR #17102.
Regarding profiling of PGI Accelerator Model (!$acc) vs. CUDA Fortran:
We currently use two different methods of performance data collection for these two programming models. The ‘pgcollect -cuda’ option should only be used for CUDA Fortran. Just use ‘pgcollect -time’ for the Accelerator Model approach.
With -cuda and CUDA Fortran, you can get GPU counter information.
With the Accelerator Model and pgcollect, you get information similar to what you get when you compile with ‘-ta=time’, but with more correlation to source code and compiler feedback.
In time we will merge these two methods.
We got it. Makes sense now.
Just an update: the issue of looking in the wrong location for CUDA fortran profiling config files only shows up when you use the -cuda option with no keywords.
If you use ‘-cuda=gmem’, ‘-cuda=branch’, or ‘-cuda=cfg:config_file’, you shouldn’t see this problem. You can create your own config file listing which ever counters you want to use, within the limitations of the counter support on your particular GPU.