Tesla C2070 Performance Comparing Tesla C2070 performance to Geforce GTX

malbow · February 17, 2011, 2:08am

We observe a strange discrepancy between the basic benchmark for the Tesla C2070 on the one hand and the the Geforce GTX 285 on the other. To be specific the Tesla C2070 gives worse performance than the Geforce GTX 285 for matrixMul (from the SDK).

GTX: 226 Gflops/s
C2070: 183 Gflops/s !!

The bandwidth test also gives worse results on the C2070

Does anyone have seen similar results? any idea on what could be done to improve the performance? It appears that the $250 is way better than a $4000 card. Are we missing something?

The C2070 is a Padova system with the following specs:

2x Nehalem 4C E5530 @ 2.4 GHz
24GB @ 1333 mHz memory
Ubuntu 10.10 64bit

The GTX card is on:
Intel Xeon @2.00GHz
8GB memory
Debian 64bit

eelsen · February 17, 2011, 6:54pm

The memory bandwidth of the 285 is actually higher than a 2070. See List of Nvidia graphics processing units - Wikipedia

Also for benchmarking sgemm or dgemm I would use the actual blas routines not the sdk example. It isn’t nearly as highly optimized and won’t necessarily give a true picture of sgemm/dgemm performance between the two cards.

I am positive that dgemm performance on the 2070 will trounce the 285.

eldadk · February 21, 2011, 5:55am

I had a similar experience with the C2050, and after a few hours of hair pulling (needed to explain to boss why we bought it for over 2000 USD… )

I found out that ECC is on by default and that it seemed to be reducing reducing memory bandwidth by over 50% …

To turn it off (and it took a while to find out how to do this, since it wasn’t in any manuals we had) I used the Nvidia control pannel.

hope this helps,

eldad.

PS - I have to say, I have recently installed the GTX 480 and except the amount of RAM, I don’t find the GTX 480 performance much below the C2050.

seibert · February 21, 2011, 1:14pm

I think the issues here boil down to a few common points of confusion:

The SDK examples usually make for terrible benchmarks. They are written with an eye toward demonstrating a particular technique in isolation rather than efficiently solving a real problem.
The Tesla cards are not faster overall than top-of-the-line GeForce cards. If your kernel is limited by single precision, integer, or memory bandwidth performance, you will find the Tesla to be slower than a GTX 480 or 580. If you are limited by double precision performance (and be sure it is not just memory bandwidth), then Tesla will be faster.

You should not buy Tesla for raw computational performance (except double precision), but rather because you want the other features: Better QA testing for 24/7 use, more memory, ECC, bidirectional DMA transfers over the PCI-Express bus, the Windows TCC driver that lets you bypass the overhead of the WDDM, better technical support

ECC really seems to be a memory bandwidth performance killer, and many kernels are memory bandwidth limited, not computationally limited.

malbow · March 24, 2011, 9:30pm

Thank you all for the help! I never received an e-mail about responses so didn’t check the replies earlier. We also found that if you take advantage of the registers on the Fermi architecture rather than just use shared memory for matrix multiplication, you get much better performance. This is described in the following paper:

Disabling ECC also helped External Image

Topic		Replies	Views
Disappointed performance using C2050 CUDA Programming and Performance	20	7915	September 2, 2010
Tesla C2050 slower than GeForce 8800? CUDA Programming and Performance	14	21025	April 20, 2011
Tesla C2050 slower than GTX295! CUDA Programming and Performance	2	21049	March 9, 2011
Tesla C2070 vs. GX2 speed test CUDA Programming and Performance	4	1452	June 23, 2011
GeForce 570 vs. Tesla c2050 CUDA Programming and Performance	3	1820	August 16, 2011
GTX580 runs faster than Tesla C2050 CUDA Programming and Performance	5	4893	January 13, 2012
How to disable/enable ECC on C2050? CUDA Programming and Performance	22	14195	April 24, 2010
Noob Alert: Tesla K20 slower than GTX 580? CUDA Programming and Performance	24	9338	November 3, 2013
Tesla vs GeForce archs What makes the tesla better? CUDA Programming and Performance	8	18402	September 14, 2009
Tesla S2050 performance double precision performance too low CUDA Programming and Performance	42	29384	December 8, 2010

Tesla C2070 Performance Comparing Tesla C2070 performance to Geforce GTX

Related topics