nvprof: timelines for GPU metrics values. --metrics and --print-gpu-trace options.

When you use nvprof with --print-gpu-trace and with --csv options you get a nice table with timestamps and values for some metrics:

"Start","Duration","Grid X","Grid Y","Grid Z","Block X","Block Y","Block Z","Registers Per Thread","Static SMem","Dynamic SMem","Size","Throughput","SrcMemType","DstMemType","Device","Context","Stream","Name"
0.298472,0.001952,,,,,,,,,,0.001953,0.977125,"Pinned","Device","Tesla M60 (0)","1","7","[CUDA memcpy HtoD]"
0.298653,0.001408,,,,,,,,,,0.001953,1.354651,"Pinned","Device","Tesla M60 (0)","1","7","[CUDA memcpy HtoD]"

This format is very convenient for plotting timelines and analyzing application behavior.

However, it only has a number of memory-related metrics. If you need other metrics and try to add --metrics option, nvprof output format changes: it no longer includes timestamps.

How is it possible to get timestamps with other metrics?

Hi, pyotr777

Have you tried using nvvp (the Visual Profiler) yet ?

You can collect metrics from UI and check “GPU details”, there is timestamp there.

Also you can export the data as csv.

Hope this helps.

Thank you!

I have tried nvvp, but so far I couldn’t get a timeline of metrics. I used to run nvprof with -o option and open saved profile in Visual Profiler.
There is a timeline there, but it only has kernel invocations and memory operations. I don’t even see FLOPs counters or any other metrics besides duration for kernels.

I tried to run nvprof with --analysis-metrics option. Unfortunately it stops with an error:
cupy.cuda.memory.OutOfMemoryError: out of memory to allocate 134217728 bytes (total 2571250176 bytes)
A partial file nvprof creates does contain more metrics for kernels, but, again, nvprof always breaks and I cannot see metrics on a timeline.

What do you mean “to collect metrics from UI”? I know there is a way to run remote profiling with nvprof from Visual Profilier. Are there any other profiling methods?

What I need is metrics timeline like the graph below. It was plotted from a CSV file like the one I mentioned in the first post.

After timeline generated, you can select Run->Configure Metrics and Events->Apply and Run to choose any metrics you are interested.

Then you can get all metrics value listed in GPU detais.

I have nvvp installed on my local computer. I created a new session with remote connection, but now Visual Profiler cannot connect.

Failed to connect sshd on "EC2-52-91-16-22.COMPUTE-1.AMAZONAWS.COM:22"
Failed to connect sshd on "EC2-52-91-16-22.COMPUTE-1.AMAZONAWS.COM:22"
Failed to connect sshd on "EC2-52-91-16-22.COMPUTE-1.AMAZONAWS.COM:22"

Anyway, can you tell if there is a difference between running remote profiling from Visual Profiler and running nvprof on remote machine using CLI?