Comparing C1060, GTX470, GTX480 and C2050 Benchmark results of the Fermi Cards and Tesla generation

ceearem · November 3, 2010, 3:31pm

Hi

I’ve been testing a number of different GPUs recently with LAMMPScuda, an MD code I am developing (available here: Google Code Archive - Long-term storage for Google Code Project Hosting. and www.tu-ilmenau.de/lammpscuda).

The code makes relatively heavy use of the texture cache and does almost not use the CPU. While one of the main focuses of the code is to scale well on GPU clusters I have only used one GPU here since I wanted to compare the performance of the GPUs only.

As a comparison I run the CPU version of LAMMPS (which uses the exact same algorithms) on a conventional node with 2 Quadcore Nehalems (X5550 @ 2.66GHz).

I have tested three different systems:

lj-melt: lowest amount of computations per memory access

(for anyone familiar with MD: its a plain lj system with 2.5 cutoff, 0.84 density, 850k atoms)

silicate/long: half of the time is spend on an 3D FFT (using cufft) and the rest is much more compute intense than lj-melt

(lithium silicate glass, buckingham potential + long range coulomb via pppm, ~12k atoms)

silicate/cut: also more compute intense than lj-melt, but no FFT

(lithium silicate glass, buckingham potential + cutoff coulomb (10A), ~100k atoms)

Single Precision

8xCPU    C1060    GTX470    GTX480   C2050    C2050ECC 

lj-melt         293      114      143       116      131      155

silicate/long   212      63.6     37.2      31.7     38.4     41.4

silicate/cut    580      123      84.2      69.8     88.9     91.5

Double Precision

8xCPU    C1060    GTX470    GTX480   C2050    C2050ECC

lj-melt         293      237      183       152      167      206

silicate/long   212      200      80.9      67.4     80.5     94.0

silicate/cut    580      536      285       221      260      353

As you can see in the first example the fermi cards are slowed down by the lack of enough texture throughput, hence the C1060 can actually beat the Fermi GPUs in single precision. In the other examples which are much less dominated by texture reads the Fermi GPUs are signifcantly faster than the C1060. As more or less expected, the GTX470 is about as fast as the C2050 since they have both the same number of cores (in the texture heavy case the C2050 is better, am I remembering correct that it got one more texture unit than the GTX470?). The GTX480 is roughly 20% faster than a GTX470.

It is interesting to see that while the Fermi cards are generally much better in double prec than the C1060 (more so than in single prec). The C2050 can not show of its much higher double prec power compared to the Geforce GPUs.

But anyway its nice to see that the Fermi GPUs beat a full node of modern intel cpus by a factor of 2-3 even in double precision.

I thought that these numbers might be interesting for you.

Cheers

Ceearem

ceearem · November 3, 2010, 3:31pm

Hi

I’ve been testing a number of different GPUs recently with LAMMPScuda, an MD code I am developing (available here: Google Code Archive - Long-term storage for Google Code Project Hosting. and www.tu-ilmenau.de/lammpscuda).

The code makes relatively heavy use of the texture cache and does almost not use the CPU. While one of the main focuses of the code is to scale well on GPU clusters I have only used one GPU here since I wanted to compare the performance of the GPUs only.

As a comparison I run the CPU version of LAMMPS (which uses the exact same algorithms) on a conventional node with 2 Quadcore Nehalems (X5550 @ 2.66GHz).

I have tested three different systems:

lj-melt: lowest amount of computations per memory access

(for anyone familiar with MD: its a plain lj system with 2.5 cutoff, 0.84 density, 850k atoms)

silicate/long: half of the time is spend on an 3D FFT (using cufft) and the rest is much more compute intense than lj-melt

(lithium silicate glass, buckingham potential + long range coulomb via pppm, ~12k atoms)

silicate/cut: also more compute intense than lj-melt, but no FFT

(lithium silicate glass, buckingham potential + cutoff coulomb (10A), ~100k atoms)

Single Precision

8xCPU    C1060    GTX470    GTX480   C2050    C2050ECC 

lj-melt         293      114      143       116      131      155

silicate/long   212      63.6     37.2      31.7     38.4     41.4

silicate/cut    580      123      84.2      69.8     88.9     91.5

Double Precision

8xCPU    C1060    GTX470    GTX480   C2050    C2050ECC

lj-melt         293      237      183       152      167      206

silicate/long   212      200      80.9      67.4     80.5     94.0

silicate/cut    580      536      285       221      260      353

As you can see in the first example the fermi cards are slowed down by the lack of enough texture throughput, hence the C1060 can actually beat the Fermi GPUs in single precision. In the other examples which are much less dominated by texture reads the Fermi GPUs are signifcantly faster than the C1060. As more or less expected, the GTX470 is about as fast as the C2050 since they have both the same number of cores (in the texture heavy case the C2050 is better, am I remembering correct that it got one more texture unit than the GTX470?). The GTX480 is roughly 20% faster than a GTX470.

It is interesting to see that while the Fermi cards are generally much better in double prec than the C1060 (more so than in single prec). The C2050 can not show of its much higher double prec power compared to the Geforce GPUs.

But anyway its nice to see that the Fermi GPUs beat a full node of modern intel cpus by a factor of 2-3 even in double precision.

I thought that these numbers might be interesting for you.

Cheers

Ceearem

happyjack272 · November 3, 2010, 3:45pm

why someone would pay over 4x the price for something with relatively negligible performance gains is beyond me. but as long as nvidia can sell it, i suppose it what keeps the desktop cards cheap.

happyjack272 · November 3, 2010, 3:45pm

why someone would pay over 4x the price for something with relatively negligible performance gains is beyond me. but as long as nvidia can sell it, i suppose it what keeps the desktop cards cheap.

kalman · November 3, 2010, 11:15pm

Would you leave a GTX480 number crunching 7/7 ?

kalman · November 3, 2010, 11:15pm

Would you leave a GTX480 number crunching 7/7 ?

seibert · November 4, 2010, 4:12am

Yes, but I can tolerate the downtime from the very occasional hardware failure. I’ve lost 2 GeForce cards out of nearly a dozen CUDA workhorses over the past 3 years, and it wasn’t even the most heavily used ones. That failure fraction would have to be reversed for a Tesla to become cost effective option in my case.

If CUDA was field-analyzing the telemetry data from a $1M/day test drill operation, then I would be happy to pay for the extra quality assurance. Of course, I might be further ahead to spend the cash on a redundant system tolerant of device failure, regardless of how much the quality assurance the device got. Everyone is going to do that [risk * cost of failure] calculation differently depending on their situation.

(That said, ECC is a very nice feature on the Tesla which I think sells it more than any burn-in testing NVIDIA does. Smart architecture can cope gracefully with visibly failed devices, but silent corruption is very hard to catch unless you are comparing results from duplicate jobs or doing other consistency checks.)

seibert · November 4, 2010, 4:12am

Yes, but I can tolerate the downtime from the very occasional hardware failure. I’ve lost 2 GeForce cards out of nearly a dozen CUDA workhorses over the past 3 years, and it wasn’t even the most heavily used ones. That failure fraction would have to be reversed for a Tesla to become cost effective option in my case.

If CUDA was field-analyzing the telemetry data from a $1M/day test drill operation, then I would be happy to pay for the extra quality assurance. Of course, I might be further ahead to spend the cash on a redundant system tolerant of device failure, regardless of how much the quality assurance the device got. Everyone is going to do that [risk * cost of failure] calculation differently depending on their situation.

(That said, ECC is a very nice feature on the Tesla which I think sells it more than any burn-in testing NVIDIA does. Smart architecture can cope gracefully with visibly failed devices, but silent corruption is very hard to catch unless you are comparing results from duplicate jobs or doing other consistency checks.)

aeronaut · November 4, 2010, 9:01pm

Actually, it’s the teens and 20-somethings playing Call of Duty, Bad Company, and whatever other FPSs are hot this quarter that keep the desktop cards cheap (and drive the market that CUDA cards are a

small offshoot of.)

Regards,

Martin

aeronaut · November 4, 2010, 9:01pm

Actually, it’s the teens and 20-somethings playing Call of Duty, Bad Company, and whatever other FPSs are hot this quarter that keep the desktop cards cheap (and drive the market that CUDA cards are a

small offshoot of.)

Regards,

Martin

Topic		Replies	Views
Disappointed performance using C2050 CUDA Programming and Performance	20	7755	September 2, 2010
GTX 580 is not as good as GTX480 for CUDA ? CUDA Programming and Performance	23	3915	November 7, 2010
Hardware for a high-end development system CUDA Programming and Performance	11	3826	June 26, 2012
best CUDA-enabled card for $100 (or so) CUDA Programming and Performance	17	3128	March 20, 2011
Seek advice on latest fermis CUDA Programming and Performance	14	1882	September 1, 2011
newbie questions CUDA Programming and Performance	14	1889	September 24, 2010
Need help to choose either the gtx 295 or the gtx 480 for massive Lattice Boltzman simulations CUDA Programming and Performance	10	1336	December 9, 2010
Attention Lucky GTX 480/GTX 470 Owners! Please run some benchmarks for us. :) CUDA Programming and Performance	88	22482	May 5, 2010
GTX 470 vs GTX 295 benchmark using sdk examples comparison between GTX 470 and GTX 295 in sdk 2.2 2. CUDA Programming and Performance	15	46628	May 6, 2010
Used C1060s? Where to find? CUDA Programming and Performance	11	2824	November 24, 2009

Comparing C1060, GTX470, GTX480 and C2050 Benchmark results of the Fermi Cards and Tesla generation

Related topics