profiling mpi programs

It seems that mpi programs (even with one core) are not compatible with the hardware event monitor of the visual profiler. Is that correct? Is there any workaround on that? Any alternative?

typically, in my experience, people profile MPI CUDA activity using nvprof, and then pull the results into visual profiler.

There are instructions in the profiler user guide.

[url]Profiler :: CUDA Toolkit Documentation

I haven’t looked at trying to do this only from the visual profiler. The visual profiler has ability to profile multiple processes, so it should be possible to do it directly:

[url]Profiler :: CUDA Toolkit Documentation

The visual profiler itself has no problem running the program. The problem is when you want to configure events. I mean [1]…
Is nvprof able to measure them with mpi programs?

[1] Profiler :: CUDA Toolkit Documentation

Both nvprof and nvvp are able to collect metric data. Events are less typically used. I’d probably need to try a specific example. nvprof can certainly show you any queryable event. Whether or not those are trivially displayable in nvvp is something I would have to take a look at.

I am referring to this message

Metric/event collection failed:
Events/metrics cannot be collected for multi-process applicaiton

@txbob:

So, I tested with two scenarios with visual profiler:

  1. Selecting Profile child processes and then:
    File = mpirun
    Working directory = /home/mahmood/lammps/eam
    Arguments = -n 2 /opt/lammps-11Aug17/src/lmp_mpi -sf gpu -pk gpu 1 -in in.eam

Then select Next and then Finish. The program runs in the profiler and I can see the same output as the linux terminal. So, it is fine. After run, when I select Run->Configure Metrics and Events, I get the following error:

Metric/event collection failed:
Events/metrics cannot be collected for multi-process applicaiton

  1. Selecting Profile current process only and then:
    File = mpirun
    Working directory = /home/mahmood/lammps/eam
    Arguments = -n 2 /opt/lammps-11Aug17/src/lmp_mpi -sf gpu -pk gpu 1 -in in.eam

Then select Next and then Finish. The program runs in the profiler and I can see the same output as the linux terminal. So, it is fine. After run, when I select Run->Configure Metrics and Events, I can see the metric window where I can select which metric to monitor. I select one of them and then I press Apply and Run. The program runs two times (!) and the run time is the same as first scenario which is odd. Usually by selecting a metric, the runtime becomes slower. But here, I didn’t see any slow runtime.

Any comment?

Hello again
While visual profiler is not able to measure the metric when “profile child processes” is selected, the nvprof command is able to do that!

Do you have any comment?