PGPROF "internal error: invalid thread id"

I’m trying to profile some CUDA code (written in Fortran, if it makes a difference), and I know I need to use several flags at compilation and runtime as follows:

> pgfortran -o foo.exe -Minfo=ccff -Mcuda -ta=nvidia foo.f90
> pgcollect -cuda foo.exe < input.file

If I use only those flags, my code will run to completion, but I do not get a lot of useful information out of PGPROF when I try to profile the code. So the next step is to use the flags -Mprof=func or -Mprof=lines at compile time. However, when I run pgcollect, I get the following error:

> pgfortran -o foo.exe -Minfo=ccff -Mprof=[lines or func] -Mcuda -ta=nvidia foo.f90

> pgcollect -cuda foo.exe < input.file
Error: internal error: invalid thread id
target process has terminated, writing profile data
PGCOLLECT: Fatal Error: No samples: out of range

That’s a whole lot of "error"s and colons to tell me that something went wrong, but I have no clue what that something is. Do I need to send an example to trs@pgroup.com?

If I use only those flags, my code will run to completion, but I do not get a lot of useful information out of PGPROF when I try to profile the code.

The accelerator info gets shown in the bottom tab when the particular kernel is selected.

So the next step is to use the flags -Mprof=func or -Mprof=lines at compile time. However, when I run pgcollect, I get the following error:

-Mprof and pgcollect can’t be used together. They use two entirely separate profiling mechanisms (instrumented vs sampling).

  • Mat