monitoring performance counters through out the execution

With the help of “cudaprof” tool we can get various performance counter for the whole execution (that too extrapolated). I need a way through which various counters can be checked in between the program excution…Please reply soon