Increase Performance with GPU Boost and K80 Autoboost

Originally published at: https://developer.nvidia.com/blog/increase-performance-gpu-boost-k80-autoboost/

NVIDIA® GPU Boost™ is a feature available on NVIDIA® GeForce® and Tesla® GPUs that boosts application performance by increasing GPU core and memory clock rates when sufficient power and thermal headroom are available (See the earlier Parallel Forall post about GPU Boost by Mark Harris). In the case of Tesla GPUs, GPU Boost is customized for compute-intensive workloads running on…

Jiri, another great post. How does this work with bandwidth bound applications? On K40 you had to use the 875MHz to get full bandwidth. Given that bandwidth bound applications usually don't come close to TDP they can probably run with the highest clock, but if on of the 24 lower clocks already maxes out the achievable bandwidth one could get away with a lower clock. What about a plot similar to Fig.2 for the stream triad memory bandwidth?

Hi Mathias, thanks for the feedback and the good
question. The Tesla K80 doesn’t need to run at the max application clocks to
achieve its full memory bandwidth. In my experiments, a GPU clock of 705 is
sufficient for memory bound applications.

Two more questions out of curiosity:
1. Is AutoBoost smart enough to only increase the clock to 705 (or whatever is sufficient to get the best performance) ? If performance does not profit from higher frequencies anymore this would be nice in terms of energy savings, stability, durability. But there is probably no way the CUDA driver can detect this. But maybe the programmer could specify a maximum boost frequency?

2. When I run a K40 with 875 MHz is will throttle if power or temperature exceed the limits. How is this different (apart from the finer levels) then Autoboost? I am sure it is - I am just curious.

1. As you say autoboost can't know that a kernel is memory bound and thus will increase the clock beyond 705 Mhz if the power and thermal conditions allow it. To avoid this you need to disable autoboost and set application clocks to the right value.
2. The difference between autoboost and setting application clocks to the highest possible clocks is that autoboost starts at the lowest clock and increase it while with application clocks the clocks are reduced from the specified value until no clock throttle reasons occur. In some cases application clocks might be faster than autoboost, e.g. for very short kernels it might take too long to spin up the clocks, while in other cases autoboost is faster, e.g. if the application clock setting is too aggressive. Energy wise autoboost is always better as it reduces the clocks when the GPU is less loaded.

I just bought a 980Ti and both the driver and nvidia-smi are showing the "Max clocks" as 1595mhz. The "graphics clock" gets up to 1430 (probably because boost is turned off, luckily!). Is this a display issue or is my card clocked wrongly? Something isn't right!

Hmm, is this a second-hand board? Clocks seem high. Saw your post on devtalk.nvidia.com (which is a more appropriate forum for this question). As to why you aren't hitting the max, a full nvidia-smi -q output posted to the forums might help to understand why you can't reach them.

Cheers, I found this site before I found the correct forum, was goggling linux nvidia boost and this came up. Happy to discuss in the forum if it's more appropriate. It's a brand new card (Inno3d Hybrid Black), in Windows it shows the correct clocks. I'll post the full output in the forum.

The auto-boost option can not be enabled on the GeForce GTX TITAN Black.
I see "Changing auto boosted clocks permissions is not supported for GPU" message. Maybe changing this option only allowed on the Nvidia Tesla cards?

Your GeForce GTX Titan Black will automatically increase its clocks if power and thermal budges allow. You do not need to explicitly enable that. However auto boost as it is described in this blog post is different from the GPU Boost 2.0 features of your GeForce GTX Titan Black.

Dear Jiri,

I see my card stays at P0 mode (max.performance) when it is idle after we forced it with gromacs jobs then it switches to P2 performance level. Do you know the reason?

I use the latest verion driver, 352.41.

Best,
Dogan

Hi Dogan,

How long does your card stay at P0 mode when it is idle before switching to P2?

Jiri

P.S. As Mark says below https://devtalk.nvidia.com/ is the more appropriate forum for this kind of question.

Immediately. Within one second after the process starts, even if there is still 0% GPU-Utilization.

Sorry I am not sure if I fully understand what you are seeing. Are you saying that the GPU goes into P0 mode when GROMACS is running but does not execute a GPU kernel? That would be expected behavior because GROMACS creates a compute context. Would you mind creating a topic at devtalk.nvidia.com to continue this discussion?

Dear Jiri, Thank you, I posted my question here:
https://devtalk.nvidia.com/...

Is this card good with string handling and natural language processing as well?

Figure 3 implies that Boost clocks can EXCEED the maximum flat-rate application clocks. Is this true? Ie, if my Tesla is well cooled and is still under the wattage limit, the GPU might have an autoboost clock rate FASTER than the maximum choosable rate using nvidia-smi -ac ?
Or is that application clock setting ALSO still applicable and is setting the maximum clocks regardless of power and temperature? If so, Figure 3 is very misleading.

Hi Gerry, no boost clocks cannot exceed the maximum configurable clocks. Figure 3 visualizes how GPU clocks of a hypothetical applications could behave on a Tesla K80 with maximum clocks of 875 Mhz and configured application clocks set to ~750 Mhz. When Auto Boost is disabled (left part) the GPU runs constantly at 750 Mhz not hitting power or thermal limits. With enabled Auto Boost the Tesla K80 can raise it clocks up to the maximum of 875 Mhz during the phases of the application where the thermal and power budget allows it to raise above 750 Mhz. However clocks cannot exceed the maximum of 875 Mhz. Hope this clarifies the situation for you. Thanks Jiri

Ah, I see. So setting maximum application clocks will give performance limited by maximum temperature and maximum wattage.
Using boost clocks will give performance limited by some default clock limit, by maximum temperature, and by a (lower) boost wattage setting.
But then it seems that there are NO situations where dynamic boost will ever outperform simply setting maximum clocks. Is that correct? If so, it seems like boost clocks are a tool useful when deliberately running at reduced wattage, but not as a performace optimization.

When setting application clocks to the maximum the achievable clocks are still limited by the power and thermal budget, i.e. all application clock settings are safe to use. It is still not recommended to set application clocks too aggressively because thermal or power violations can cause significant GPU clock drops. Autoboost raises the GPU clocks incrementally avoiding these significant drops. There is also a change how application clocks and Autoboost work together with Pascal. On Pre-Pascal, setting application clocks ensures GPU clocks between the application clock value and the max clock value unless throttled for power or thermal reasons. On Pascal and later, setting application clocks disables Autoboost and locks the GPU to the application clock setting unless throttled for power or thermal reasons.