Hi, I have a mix MPI/CUDA FORTRAN code that runs for 1 minute when profiled with nvprof. I’m considering use pgprof to have a full picture of device and host codes. Using pgprof all mpi processes stay o status S and the runs takes a long time. I don’t know if it will be hang forever because I kill the application.
Does anyone have a clue about what can be this behavior related to?
I’m using pgi 19.10 and a power9 machine with v100 cards.
Hi maiconsaulfaria,
What’s the command line you’re using? Are you using metrics?
Using metrics can take quite awhile to run since the profiler need to replay kernels many times to gather the needed hardware information.
Typically, I’ll use the command “mpirun -np N pgprof -o profile.%p.prof a.out”. This will generate a profile for each rank and save them to files. The “%p” will add the process id as part of the file name.
-Mat
The run line is the same
mpirun -np 40 pgprof -o prof_192011_%q{OMPI_COMM_WORLD_RANK}.prof ./exe
“top” gives the status
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
51419 bsc99214 20 0 1397952 756736 48320 S 105.6 0.1 0:24.67 exe
51446 bsc99214 20 0 1484032 765056 49920 S 105.6 0.1 0:25.44 exe
51449 bsc99214 20 0 1484032 765312 50048 S 105.6 0.1 0:25.54 exe
One strange thing is that using the rank output %d the run fails:
mpirun -np 40 pgprof -o profile.%p.prof ./nemo
======== Error: Application received signal 11
-------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
-------------------------------------------------------
--------------------------------------------------------------------------
mpirun detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:
Process name: [[31666,1],22]
Exit code: 2
This normal “without profiling” run program outputs to the stderr, this can be an issue?
This signal 11 is solved. It was because the program was writing in stderr. Still, the execution hangs/State S
Try disabling the CPU stack-unwinding via the flag “–cpu-profiling-unwind-stack” or disabling CPU profiling altogether via “–cpu-profiling off”.
-Mat
Thanks Mat, disabling cpu-profiling allow the run completion. With --cpu-profiling-unwind-stack off I got the same problem. Disable cpu profiling is not an option, I would like to analyse host and device code.