If I want to profile the program performance, I can use the perf tool [perf][https://perf.wiki.kernel.org/index.php/Main_Page] to collect the CPU micro-events such as :page-faults, branch-misses. And the Intel CPUs provides 4-6 registers in PMC（performance monitoring unit）for counting this event.
I want to profile my cuda programs by nvprof. There are 141 events in nvprof, such as l1_local_load_hit, l1_local_load_miss. [nvprof][https://docs.nvidia.com/cuda/profiler-users-guide/index.html#nvprof-overview]. And how many registers supplied by PMC（performance monitoring unit）in NVIDIA GPU? My GPU products is K80, P100. Thanks!