I am a bit puzzled by this statement: “Highly dense computation codes (such as matrix multiply - eg. SGEMM/DGEMM) are not likely to benefit much from boost mode.” Could you explain the reasoning behind it? After all, GEMM is computationally limited, and therefore benefits from an increase in core clock in a linear fashion.
If the thought is that GEMM running at the fastest available boost clock would cause the GPU to exceed the thermal or power envelope and lead to clock throttling, my experience with K40 is that this does not necessarily occur. In fact, I had a difficult time approaching these limits no matter what kind of prolonged GEMM computation I tried. Your mileage may vary, and I cannot speak for the K80, as I have never used one. Obviously, there are also computational kernels that cause the GPU to draw more power than when running GEMM.
There are many different flavors of GEMM under the hood, not only based on data type but also based on sizes and aspect ratios of matrices. There are some differences in power consumption between these flavors. Other differences in power consumption exist because of natural variations in the power characteristics of each individual card, and because of different cooling and thus operating temperature. Lastly, different applications exhibit different “duty cycles” when they they call GEMM interspersed with other kernels.
I always encourage CUDA users to try running at the highest available boost clock on a Tesla, there is a high probability that the card will be able to sustain it, assuming proper cooling.