API can measure or query values of performance counters

Hi guys,
I need to model the performance of Tesla K40 ,I want to run a fragment of my data set on gpu, for example 20%, then by measuring some performance counters i can characterize the rest of input data set, corresponding to the performance model i made before by these performance counters and various data sets.
what API can query some performance counters during cuda program executation?for example by nvml i can query power consumption of GPU everyWhere of my code.I need something like nvml for measuring performance counters every where of my code.

Please stop posting the same question into multiple sub-forums. That will trigger the forum’s SPAM filter.

I’m sorry,i didn’t know that what sub-forum is the accurately related to my topic.

What you’re suggesting is fairly difficult, but the profiler API is CUPTI:

http://docs.nvidia.com/cuda/cupti/index.html#axzz4oVqSnSnJ

Thanks a lot for your comment
But as you say, using CUPTI is very challenging for me,
Can you give me a sample code that check some metrics using CUPTI,
Thanks in advance.

There are cupti sample codes installed with the CUDA toolkit. On a standard linux install they are in:

/usr/local/cuda/extras/CUPTI/sample