for gpgpu computing purpuses we were doubting between GTX TITANS and Tesla c2075’s (when used about in the same price range).
The specs were however in favor of the GTX Titans, with the ECC memory as an sole exception.
with 4500 GFLOPS (Single) 1300-1500 GFLOPS (Double), cuda 3.5 (dynamic parallelism) for the TITAN
1030 GFLOPS (Single), 515 GFLOPS (double) cuda 2.0 (no dynamic parallelism) for the Tesla c2075
for the specs see
But after receiving our first titan, we noticed that the double performance is less then the C2075!
Single-Precision GFLOPS in AIDA64 4543 (OK)
Double-Precision GFLOPS in AIDA64 222.3! (INSTEAD OF 1300!)
same when doing the matlab gpuBench(), that is testing with matrix multiplications, solving systems and fft’s
I spoke with the programmers of the MATLAB gpuBench benchmark, and they had the same observations, together with many people contacting them. (So it doesnt seem to be system dependent)
Is there any one who could explain these double precision performance differences?
Still i was a bit dissapointed however about the performance on Backslash (solving equations)
But that might be a more hybrid load where also the CPU is more important, and probably our setup is inferior to the reference setup.
Or is there something else why the C2075 is better at that?
I will try to put the card in another pc in the commin days in order to see the sensitivity of Backslash results with respect to the system.
My program which relies heavily on cuFFT library shows about 20-30 % speed-up on Titan compared to the K20 card. The k20 is roughly 2 times faster than the C2075 for the same problem.