nvprof. How many registers for nvprof?

Background:

If I want to profile the program performance, I can use the perf tool [perf][https://perf.wiki.kernel.org/index.php/Main_Page] to collect the CPU micro-events such as :page-faults, branch-misses. And the Intel CPUs provides 4-6 registers in PMC(performance monitoring unit)for counting this event.

Question:

I want to profile my cuda programs by nvprof. There are 141 events in nvprof, such as l1_local_load_hit, l1_local_load_miss. [nvprof][https://docs.nvidia.com/cuda/profiler-users-guide/index.html#nvprof-overview]. And how many registers supplied by PMC(performance monitoring unit)in NVIDIA GPU? My GPU products is K80, P100. Thanks!

Hi,

Can you tell about use case for knowing number of perfmon register ?
I’m afraid this are internal details and cannot be exposed in public domain.

Hi veraj,

I want to make sure How many events can I should collect at a time properly?

As Intel CPU profiler, it provides 4 registers in PMC(performance monitoring unit)for counting events. Then I should collect 4 events at a time advisably.

If I collect less events, it wastes the register resources, if I collect more events, it happens Multiplexing phenomenon that interfaces the result, it also makes the program slowly. https://devtalk.nvidia.com/default/topic/1010235/visual-profiler/nvprof-is-too-slow/

And where I can get the internal details about this demand?

Thanks!

Hi,

There are no fixed number of events that can be profiled in single pass. It depends on event/metric combination.

User can use “cuptiEventGroupSetsCreate” API in CUPTI to find the number of passes required by combination of event provided to cuptiEventGroupSetsCreate.

documentation of CUPTI is available at https://docs.nvidia.com/cuda/cupti/r_main.html#r_main";