Online Power Optimization with Feedback of the Performance

tetsuro0086 · July 15, 2024, 2:39am

I’d like to start a project that optimizes GPU power consumption by changing its frequency in an application transparent way, that is, without changing the app code, a host daemon process with admin privilege snooping the app performance.
Besides, another requirement is that the optimization is done during the process, i.e,. online optimization.

Is there a way, or some API, to immediately get performance (such as kernel execution time + memory transfer time) from outside the running application and in the middle of job execution?

njuffa · July 15, 2024, 3:23am

The publicly documented API for monitoring and managing the state of NVIDIA GPUs is the NVIDIA management library (NVML):

https://docs.nvidia.com/deploy/nvml-api/index.html

This library also serves as the basis of the nvidia-smi utility. NVIDIA also offers the CUDA Profiling Tools Interface (CUPTI):

I have not used either API so cannot assess whether they offer sufficient means to realize your envisioned adaptive real-time control mechanism.

The relevant sub-forum for NVML appears to be this one:

The relevant sub-forum for CUPTI appears to be this one:

Curefab · July 15, 2024, 8:10pm

Do you want to get the performance for that existing application (kernel execution time + memory transfer time) as written? That seems like a very indirect way to measure performance. Perhaps more accurate would be to know the current frequency + the used GPU model. Or are there certain reasons (e.g. one important kernel and each call of it signifies an important measurement unit, e.g. 1,000,000 work packages; or you want to display kernel execution time + memory transfer time + kernel name to the user).

tetsuro0086 · July 16, 2024, 1:01am

Thanks! Please let me know if you know some API in those libraries that tells us each GPU kernel execution time online.

tetsuro0086 · July 16, 2024, 1:04am

Do you want to get the performance for that existing application (kernel execution time + memory transfer time) as written?

Nope. I’d like to reduce power consumption as a cloud administrator being blind to each application code written. That’s why I need a very indirect way to measure performance.