nvprof. How many registers for nvprof?


If I want to profile the program performance, I can use the perf tool [perf][https://perf.wiki.kernel.org/index.php/Main_Page] to collect the CPU micro-events such as :page-faults, branch-misses. And the Intel CPUs provides 4-6 registers in PMC(performance monitoring unit)for counting this event.


I want to profile my cuda programs by nvprof. There are 141 events in nvprof, such as l1_local_load_hit, l1_local_load_miss. [nvprof][https://docs.nvidia.com/cuda/profiler-users-guide/index.html#nvprof-overview]. And how many registers supplied by PMC(performance monitoring unit)in NVIDIA GPU? My GPU products is K80, P100. Thanks!


Can you tell about use case for knowing number of perfmon register ?
I’m afraid this are internal details and cannot be exposed in public domain.

Hi veraj,

I want to make sure How many events can I should collect at a time properly?

As Intel CPU profiler, it provides 4 registers in PMC(performance monitoring unit)for counting events. Then I should collect 4 events at a time advisably.

If I collect less events, it wastes the register resources, if I collect more events, it happens Multiplexing phenomenon that interfaces the result, it also makes the program slowly. https://devtalk.nvidia.com/default/topic/1010235/visual-profiler/nvprof-is-too-slow/

And where I can get the internal details about this demand?



There are no fixed number of events that can be profiled in single pass. It depends on event/metric combination.

User can use “cuptiEventGroupSetsCreate” API in CUPTI to find the number of passes required by combination of event provided to cuptiEventGroupSetsCreate.

documentation of CUPTI is available at https://docs.nvidia.com/cuda/cupti/r_main.html#r_main";