I need to model the performance of Tesla K40 ,I want to run a fragment of my data set on gpu, for example 20%, then by measuring some performance counters i can characterize the rest of input data set, corresponding to the performance model i made before by these performance counters and various data sets.
what API can query some performance counters during cuda program executation?for example by nvml i can query power consumption of GPU everyWhere of my code.I need something like nvml for measuring performance counters every where of my code.

What you’re suggesting is fairly difficult, but the profiler API is CUPTI:


There are cupti sample codes installed with the CUDA toolkit. On a standard linux install they are in: