Overclocking/ tuning to make CUDA execute faster

What would one change in the parameters (e.g. clock speed, shader, etc) to make CUDA run as fast as possible? What could one simultaneously reduce to offset the thermal impact of these performance enhancement changes?

I am specifically interested in improving the CUDA performance of the GTX480 and Quadro FX4800 cards.

Most CUDA programs are limited by one (or more) of the following:

  1. shader clock
  2. memory clock
  3. PCI-Express bandwidth or latency

Changing #1 and #2 are possible, but there is not much you can do about #3. Unfortunately, CUDA programs are widely split between #1 and #2 as the bottleneck, so there isn’t a universal overclocking solution. Even worse, I get the impression that increasing the memory clock is far more difficult, and there are quite a few CUDA programs that are memory bandwidth bound.

As for mitigating the heat effects, I’ve never heard of anyone deliberately increasing the shader clock and decreasing the memory clock, or vice versa. Careful power measurements while adjusting both values will reveal whether this is a viable way to compensate for the extra power draw. If that turns out to be true, then I could imagine scaling shader and memory clocks in opposite directions until a given CUDA kernel is more evenly balanced between compute and memory bound. This would be a kernel-by-kernel tuning, though.

And there’s always the usual water-cooling option.

Most CUDA programs are limited by one (or more) of the following:

  1. shader clock
  2. memory clock
  3. PCI-Express bandwidth or latency

Changing #1 and #2 are possible, but there is not much you can do about #3. Unfortunately, CUDA programs are widely split between #1 and #2 as the bottleneck, so there isn’t a universal overclocking solution. Even worse, I get the impression that increasing the memory clock is far more difficult, and there are quite a few CUDA programs that are memory bandwidth bound.

As for mitigating the heat effects, I’ve never heard of anyone deliberately increasing the shader clock and decreasing the memory clock, or vice versa. Careful power measurements while adjusting both values will reveal whether this is a viable way to compensate for the extra power draw. If that turns out to be true, then I could imagine scaling shader and memory clocks in opposite directions until a given CUDA kernel is more evenly balanced between compute and memory bound. This would be a kernel-by-kernel tuning, though.

And there’s always the usual water-cooling option.

Thanks for the info.

What about voltage adjustments?

Thanks for the info.

What about voltage adjustments?

Presumably changing voltages might be required to push the envelope. I recall a guy on eBay who was actually selling GTX 200 series cards where he had hand-modified the voltage regulators to allow for extreme overclocking. Anyone who can successfully hack the hardware on their GPU gets my respect. :)

I have no idea if this is adjustable in software. It certainly isn’t exposed by the “CoolBits”-enabled controls in the Linux X.org driver.

Presumably changing voltages might be required to push the envelope. I recall a guy on eBay who was actually selling GTX 200 series cards where he had hand-modified the voltage regulators to allow for extreme overclocking. Anyone who can successfully hack the hardware on their GPU gets my respect. :)

I have no idea if this is adjustable in software. It certainly isn’t exposed by the “CoolBits”-enabled controls in the Linux X.org driver.

It’s possible to adjust GPU voltage in Windows via software (look for “rivatuner” or the other tools like MSI Afterburner which use its library.) These tools also allow shader and memory overclocks.

One interesting conclusion I’ve reached is that shader-clock overclocking is easier with CUDA, likely because it’s only using as subset of the hardware. I’ve run my raytrace kernels on my GTX480’s (default 1400MHz) with the shader at 1670 Mhz for 24 hours straight with no errors (rendering and rerendering the same block and checking for even a one-pixel difference). Running the same card at the same shader clock and running some graphics tool like FurMark gives instant artifacts. Memory overclocks seem to hit CUDA and graphics apps at roughly the same speeds (which is logical.)

These overclocks only really work in Windows. In Linux, you have very little overclocking control… the “CoolBits” option allows you to change shader and memory clocks only of display GPUs with the watchdog kernel timeout enabled. CUDA compute-only GPUs cannot be overclocked. There’s also no voltage options in Linux.

There’s a final overclocking method which gets risky, and that is to download firmware images of the cards BIOS, edit their built in frequencies and voltages, and reburn your card’s firmware. This is OS independent but obviously not for the faint of heart.

It’s possible to adjust GPU voltage in Windows via software (look for “rivatuner” or the other tools like MSI Afterburner which use its library.) These tools also allow shader and memory overclocks.

One interesting conclusion I’ve reached is that shader-clock overclocking is easier with CUDA, likely because it’s only using as subset of the hardware. I’ve run my raytrace kernels on my GTX480’s (default 1400MHz) with the shader at 1670 Mhz for 24 hours straight with no errors (rendering and rerendering the same block and checking for even a one-pixel difference). Running the same card at the same shader clock and running some graphics tool like FurMark gives instant artifacts. Memory overclocks seem to hit CUDA and graphics apps at roughly the same speeds (which is logical.)

These overclocks only really work in Windows. In Linux, you have very little overclocking control… the “CoolBits” option allows you to change shader and memory clocks only of display GPUs with the watchdog kernel timeout enabled. CUDA compute-only GPUs cannot be overclocked. There’s also no voltage options in Linux.

There’s a final overclocking method which gets risky, and that is to download firmware images of the cards BIOS, edit their built in frequencies and voltages, and reburn your card’s firmware. This is OS independent but obviously not for the faint of heart.