Online Power Optimization with Feedback of the Performance

I’d like to start a project that optimizes GPU power consumption by changing its frequency in an application transparent way, that is, without changing the app code, a host daemon process with admin privilege snooping the app performance.
Besides, another requirement is that the optimization is done during the process, i.e,. online optimization.

Is there a way, or some NVML API, to immediately get performance (such as kernel execution time + memory transfer time) from outside the running application and in the middle of job execution?