How to find current device clock frequency from C++ CUDA


I’m currently benchmarking short programs on CUDA and have run into a problem I think is due to the dynamic frequency control on the device (GTX 1060 3GB) according to workload. If I set the number of iterations of the test sufficiently high the throughput rate is much higher. In fact I’m seeing better results for short runs from a 1050 Ti.

I’m pretty sure this is due to dynamic frequency control. How can I read the current clock frequency of the GPU from within my CUDA C++ program?

Take a look at the NVML library, which is the basis of the nvidia-smi utility:

nvmlDeviceGetClock seems to be the API call of interest.