Fix for GTX480 DP performance

Has anyone discovered how to “repair” the crippling Nvidia did to the GTX480 DP performance?

Yes… buy a C2050 ;)

Yes… buy a C2050 ;)

No, it’s probably disabled in hardware or in the device’s firmware. In any case, I don’t think nVidia would be OK with people publicly posting a “fix” for it here on their own forum.

No, it’s probably disabled in hardware or in the device’s firmware. In any case, I don’t think nVidia would be OK with people publicly posting a “fix” for it here on their own forum.

:rolleyes: Probably true …

OH well …

:rolleyes: Probably true …

OH well …

Adding a question mark to the topics title would be nice…
Imagine a kid on xmas-eve when you take his new toy away from him! Imagine those eyes and big tears in them :)

Adding a question mark to the topics title would be nice…
Imagine a kid on xmas-eve when you take his new toy away from him! Imagine those eyes and big tears in them :)

What is “DP”, double precision?
How severe is the 480s degredation?

Thanks

What is “DP”, double precision?
How severe is the 480s degredation?

Thanks

Yes. The Tesla C2050 computes in double precision at 1/2 the rate of single precision, whereas the GTX 465/470/480 series of cards compute in double precision at 1/8 the rate of single precision (like both the Tesla and GeForce cards using the previous generation GT200 chip).

Yes. The Tesla C2050 computes in double precision at 1/2 the rate of single precision, whereas the GTX 465/470/480 series of cards compute in double precision at 1/8 the rate of single precision (like both the Tesla and GeForce cards using the previous generation GT200 chip).

Hummm… According to the latest dgemm posted on the magma site, the C2050 reaches 300 GFlop/s (58% of theoretical peak). My GTX480 on that same benchmark reaches 165 GFlop/s. This is a little better than half the speed of the C2050, but quite a bit better than the numbers suggested above - in double precision the GTX480 is 1/2 the rate of the C2050, rather than 1/4 the rate.

In single precision the C2050 reaches 639 GFlop/s, whereas the GTX480 reaches 835 GFlop/s on the magma sgemm benchmark. Double precision on the C2050 is 1/2 the rate of single precision as you say, but on the GTX480 it is 1/5 (165/840) the rate of single precision. But it seems the GTX480 runs a little faster than the C2050, so that in the end its double precision is just half the rate of the C2050.

Hummm… According to the latest dgemm posted on the magma site, the C2050 reaches 300 GFlop/s (58% of theoretical peak). My GTX480 on that same benchmark reaches 165 GFlop/s. This is a little better than half the speed of the C2050, but quite a bit better than the numbers suggested above - in double precision the GTX480 is 1/2 the rate of the C2050, rather than 1/4 the rate.

In single precision the C2050 reaches 639 GFlop/s, whereas the GTX480 reaches 835 GFlop/s on the magma sgemm benchmark. Double precision on the C2050 is 1/2 the rate of single precision as you say, but on the GTX480 it is 1/5 (165/840) the rate of single precision. But it seems the GTX480 runs a little faster than the C2050, so that in the end its double precision is just half the rate of the C2050.

empty post.

empty post.

What is happening is that the C2050 and GTX 480 are being held back by memory bandwidth etc and not double precision flops.

ie the GTX 480’s 165 Gflops is probably 95% of its peak Gflops.

On other problems the C2050 would probably get a lot more than 300 Gflop.

Double precision implies twice the data must be moved about. So more problems will be memory bandwidth limited than with single precision.

What is happening is that the C2050 and GTX 480 are being held back by memory bandwidth etc and not double precision flops.

ie the GTX 480’s 165 Gflops is probably 95% of its peak Gflops.

On other problems the C2050 would probably get a lot more than 300 Gflop.

Double precision implies twice the data must be moved about. So more problems will be memory bandwidth limited than with single precision.