GPU performance counters


I have been trying to read GPU performance couters through PAPI,
but it has been a bit difficult for me to use it. Is there any other
mechanism through which I can read the GPU performance counters?


If you are trying to access the counters at run time you can use the CUPTI SDK. The CUPTI SDK is included in the CUDA Toolkit. nvprof, Visual Profiler, and PAPI are all implemented using the CUPTI SDK.

Could you please guide me towards some examples to get started?

Examples are included in the CUDA release - see %CUDA_PATH%/extras/CUPTI/sample/

Thanks for your reply. Actually, I am trying to do some optimization based on the performance counters.
I want to read them in a space of 250 ms continuously and then take a decision based on the values.
I tried a sample code 'Event Sampling" in the CUPTI/sample and it gave me a message

“Event sampling is not supported for Tesla family Devices”

What does this mean. BTW, I am using a GTS 250.

Probably means that you have to go up to at least Fermi device for event sampling to work.
Tesla was the previous generation GT 2xx devices… it’s a bit of a misnomer because Tesla also refers to the professional level cards… that’s not what the error is referring to in this case, which might be why you’re confused.

Does that mean, I cannot profile the application benchmarks like Rodinia suite transparently in terms of
reading performance counters in fixed intervals, without changing the application code?