I’m currently benchmarking short programs on CUDA and have run into a problem I think is due to the dynamic frequency control on the device (GTX 1060 3GB) according to workload. If I set the number of iterations of the test sufficiently high the throughput rate is much higher. In fact I’m seeing better results for short runs from a 1050 Ti.
I’m pretty sure this is due to dynamic frequency control. How can I read the current clock frequency of the GPU from within my CUDA C++ program?
Thanks as always,