How to retrieve values set with --lock-gpu-clocks?

I am one of the developers behind llama.cpp (language model inference). The GPUs that I have locally and that I have used to tune performance are consumer GPUs such as RTX 4090s. However, I noticed that that the logic for choosing which kernel to run needs to depend on the GPU frequency: at a high graphics clock a kernel using simply FP16 arithmetic performs better than a kernel using FP16 tensor cores, at a low graphics clock the tensor core kernel performs better. So for e.g. an L40 a different kernel should be run than for an RTX 4090. I can query the max. graphics clock via nvmlDeviceGetMaxClockInfo but this only returns the maximum clock of the GPU itself. If one were to manually set a lower clock via nvidia-smi --lock-gpu-clocks 0,420 --mode 1 this is not reflected in the returned value. How can I retrieve the maximum graphics clock that a GPU can boost to from inside a CUDA program? I’ve tried some of the other functions in nvml.h such as nvmlDeviceGetMaxCustomerBoostClock but they were not supported.

That’s the fun part, you don’t.

Well, I managed to get hold of an NVIDIA engineer. Unfortunately they were not able to help and according to them “the function is likely deprecated“. Very Disappointing!

There are zero deprecations for getting locked clocked values. The Nvidia driver does know about them, but Nvidia isn’t exposing them via NVML.

nvmlDeviceGetMaxCustomerBoostClock

is likely deprecated.