I’m trying to profile some CUDA code (written in Fortran, if it makes a difference), and I know I need to use several flags at compilation and runtime as follows:
> pgfortran -o foo.exe -Minfo=ccff -Mcuda -ta=nvidia foo.f90 > pgcollect -cuda foo.exe < input.file
If I use only those flags, my code will run to completion, but I do not get a lot of useful information out of PGPROF when I try to profile the code. So the next step is to use the flags -Mprof=func or -Mprof=lines at compile time. However, when I run pgcollect, I get the following error:
> pgfortran -o foo.exe -Minfo=ccff -Mprof=[lines or func] -Mcuda -ta=nvidia foo.f90 > pgcollect -cuda foo.exe < input.file Error: internal error: invalid thread id target process has terminated, writing profile data PGCOLLECT: Fatal Error: No samples: out of range
That’s a whole lot of "error"s and colons to tell me that something went wrong, but I have no clue what that something is. Do I need to send an example to firstname.lastname@example.org?