GTX TITAN (Double Precision) FLOPS, way off specs

Hi All,

for gpgpu computing purpuses we were doubting between GTX TITANS and Tesla c2075’s (when used about in the same price range).

The specs were however in favor of the GTX Titans, with the ECC memory as an sole exception.
with 4500 GFLOPS (Single) 1300-1500 GFLOPS (Double), cuda 3.5 (dynamic parallelism) for the TITAN
1030 GFLOPS (Single), 515 GFLOPS (double) cuda 2.0 (no dynamic parallelism) for the Tesla c2075
for the specs see

But after receiving our first titan, we noticed that the double performance is less then the C2075!

Single-Precision GFLOPS in AIDA64 4543 (OK)
Double-Precision GFLOPS in AIDA64 222.3! (INSTEAD OF 1300!)

same when doing the matlab gpuBench(), that is testing with matrix multiplications, solving systems and fft’s

        MTimes_D Backslash_D   FFT_D   MTimes_S Backslash_S    FFT_S

Tesla C2075 333.84 246.11 73.36 696.37 435.56 163.04
GF GTX TITAN 223.68 82.34 77.05 3635.97 179.13 252.21

I spoke with the programmers of the MATLAB gpuBench benchmark, and they had the same observations, together with many people contacting them. (So it doesnt seem to be system dependent)

Is there any one who could explain these double precision performance differences?

Are you aware that the GTX Titan needs to be switched into double precision mode first,
before unfolding its potential?

In this mode it doesn’t clock up as aggressively (this limits the card’s boost clocks)


Thank you Christian! No i didnt know that.

Now things look different:

              MTimes_D  Backslash_D     FFT_D  MTimes_S Backslash_S    FFT_S

GeForce GTX TITAN 1285.83 128.35 146.92 3423.22 182.58 227.61
Tesla C2075 333.84 246.11 73.36 696.37 435.56 163.04

and 1530 GFLOPS at the AIDA64 GPGPU

Still i was a bit dissapointed however about the performance on Backslash (solving equations)
But that might be a more hybrid load where also the CPU is more important, and probably our setup is inferior to the reference setup.

Or is there something else why the C2075 is better at that?

I will try to put the card in another pc in the commin days in order to see the sensitivity of Backslash results with respect to the system.

My program which relies heavily on cuFFT library shows about 20-30 % speed-up on Titan compared to the K20 card. The k20 is roughly 2 times faster than the C2075 for the same problem.