GTX TITAN (Double Precision) FLOPS, way off specs

SARMES · March 28, 2014, 12:47pm

Hi All,

for gpgpu computing purpuses we were doubting between GTX TITANS and Tesla c2075’s (when used about in the same price range).

The specs were however in favor of the GTX Titans, with the ECC memory as an sole exception.
with 4500 GFLOPS (Single) 1300-1500 GFLOPS (Double), cuda 3.5 (dynamic parallelism) for the TITAN
1030 GFLOPS (Single), 515 GFLOPS (double) cuda 2.0 (no dynamic parallelism) for the Tesla c2075
for the specs see

But after receiving our first titan, we noticed that the double performance is less then the C2075!

Single-Precision GFLOPS in AIDA64 4543 (OK)
Double-Precision GFLOPS in AIDA64 222.3! (INSTEAD OF 1300!)

same when doing the matlab gpuBench(), that is testing with matrix multiplications, solving systems and fft’s

        MTimes_D Backslash_D   FFT_D   MTimes_S Backslash_S    FFT_S

Tesla C2075 333.84 246.11 73.36 696.37 435.56 163.04
GF GTX TITAN 223.68 82.34 77.05 3635.97 179.13 252.21

I spoke with the programmers of the MATLAB gpuBench benchmark, and they had the same observations, together with many people contacting them. (So it doesnt seem to be system dependent)

Is there any one who could explain these double precision performance differences?

cbuchner1 · March 28, 2014, 12:52pm

Are you aware that the GTX Titan needs to be switched into double precision mode first,
before unfolding its potential?

In this mode it doesn’t clock up as aggressively (this limits the card’s boost clocks)

Christian

SARMES · March 28, 2014, 2:54pm

Thank you Christian! No i didnt know that.

Now things look different:

              MTimes_D  Backslash_D     FFT_D  MTimes_S Backslash_S    FFT_S

GeForce GTX TITAN 1285.83 128.35 146.92 3423.22 182.58 227.61
Tesla C2075 333.84 246.11 73.36 696.37 435.56 163.04

and 1530 GFLOPS at the AIDA64 GPGPU

Still i was a bit dissapointed however about the performance on Backslash (solving equations)
But that might be a more hybrid load where also the CPU is more important, and probably our setup is inferior to the reference setup.

Or is there something else why the C2075 is better at that?

I will try to put the card in another pc in the commin days in order to see the sensitivity of Backslash results with respect to the system.

pasoleatis · April 1, 2014, 12:43pm

My program which relies heavily on cuFFT library shows about 20-30 % speed-up on Titan compared to the K20 card. The k20 is roughly 2 times faster than the C2075 for the same problem.

Topic		Replies	Views
Double precision and CUDA CUDA Programming and Performance	9	7843	October 21, 2013
GeForce 570 vs. Tesla c2050 CUDA Programming and Performance	3	1786	August 16, 2011
GTX 280 and Tesla 10 DP How much DP peak? CUDA Programming and Performance	8	11472	June 17, 2008
Double-precision on GTX 280 and coming telsa S1070 CUDA Programming and Performance	11	21613	August 22, 2008
Tesla C2050 vs Geforce 480 GTX performance of the cuFFT CUDA Programming and Performance	1	691	March 15, 2011
TITAN X CUDA Programming and Performance	35	10454	March 23, 2015
Double precision perfomance CUDA Programming and Performance	3	1014	January 10, 2014
Tesla S2050 performance double precision performance too low CUDA Programming and Performance	42	29253	December 8, 2010
GTX 280, CUDA and Double Precision CUDA Programming and Performance	15	16844	July 17, 2008
CUDA Double Precision Performance 933 GFlops vs 78GFlops CUDA Programming and Performance	17	10040	March 9, 2009

GTX TITAN (Double Precision) FLOPS, way off specs

Related topics