I’m not sure if this has been covered already but if you are wondering why your CUDA kernels don’t seem to be prodding your Maxwell v2 GPU to its max rated memory clock speed then read this thread.
Kudos to the people on that thread for recognizing that compute applications weren’t achieving the same memory clocks as graphics applications.
In my case, I have an EVGA GTX 980 SC ACX 2.0 that immediately boosts to a GPU/MEM clock of 1392/1502 MHz.
However, the card is rated for a max MEM clock of 1752 MHz yet I had never seen a CUDA kernel boost beyond 1502 MHz.
After reading the above thread, I queried the supported clocks:
nvidia-smi -i <device id> -q -d SUPPORTED_CLOCKS | more
… and set the application clocks to the max supported for this card:
nvidia-smi -i <device id> -ac 3505,1531
The results are impressive!
The CUDA Samples “Bandwidth Test” now reports almost 200 GB/s instead of the previous ~160 GB/s.
My HotSort benchmark leapt as well! The purple line shows the impact of the improved mem clock boost.
I wonder why compute kernels default to a lower power state?
This is on Win7/x64 + 358.87.