pgprof taking long

Hi, I have a mix MPI/CUDA FORTRAN code that runs for 1 minute when profiled with nvprof. I’m considering use pgprof to have a full picture of device and host codes. Using pgprof all mpi processes stay o status S and the runs takes a long time. I don’t know if it will be hang forever because I kill the application.
Does anyone have a clue about what can be this behavior related to?
I’m using pgi 19.10 and a power9 machine with v100 cards.

Hi maiconsaulfaria,

What’s the command line you’re using? Are you using metrics?

Using metrics can take quite awhile to run since the profiler need to replay kernels many times to gather the needed hardware information.

Typically, I’ll use the command “mpirun -np N pgprof -o profile.%p.prof a.out”. This will generate a profile for each rank and save them to files. The “%p” will add the process id as part of the file name.

-Mat

The run line is the same

mpirun -np 40  pgprof -o prof_192011_%q{OMPI_COMM_WORLD_RANK}.prof   ./exe

“top” gives the status

   
 PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                                                                                                                                                                                   
 51419 bsc99214  20   0 1397952 756736  48320 S 105.6  0.1   0:24.67 exe                                                                                                                                                                                                     
 51446 bsc99214  20   0 1484032 765056  49920 S 105.6  0.1   0:25.44 exe                                                                                                                                                                                                      
 51449 bsc99214  20   0 1484032 765312  50048 S 105.6  0.1   0:25.54 exe

One strange thing is that using the rank output %d the run fails:

mpirun -np 40  pgprof -o profile.%p.prof  ./nemo

======== Error: Application received signal 11
-------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
-------------------------------------------------------
--------------------------------------------------------------------------
mpirun detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:

  Process name: [[31666,1],22]
  Exit code:    2

This normal “without profiling” run program outputs to the stderr, this can be an issue?

This signal 11 is solved. It was because the program was writing in stderr. Still, the execution hangs/State S

Try disabling the CPU stack-unwinding via the flag “–cpu-profiling-unwind-stack” or disabling CPU profiling altogether via “–cpu-profiling off”.

-Mat

Thanks Mat, disabling cpu-profiling allow the run completion. With --cpu-profiling-unwind-stack off I got the same problem. Disable cpu profiling is not an option, I would like to analyse host and device code.