Accessing profiler counter from kernel


Is there a way to configure the GPU to collect instruction_executed counter (for example), and access it from the kernel?

  1. I know that CUPTI gives you the option to configure which events (counters) you want to collect, is there a way to do it directly w/o CUPTI?
  2. I know that you can access %pm (performance monitors) special registers from the kernel, are they the registers that store profiling data, if yes how do I know which of them holds instruction_executed counter (after I configured the GPU to collect this event) ?

I wish I was able to access profiler counters the same way I access %clock register, for example if I want see how many instruction where executed from the beginning of the invocation on the current SM.