Tesla 20-Series Features and Advantages

sumitg · April 2, 2010, 4:06am

Updated: Aug 11: There is now a web page with details:
[url=“High Performance Supercomputing | NVIDIA Data Center GPUs”]http://www.nvidia.com/object/why-choose-tesla.html[/url]

A common question we get is why should I buy Tesla instead of GeForce.
Here are some things to consider, written with Tesla 20-series / Fermi products in mind:

Tesla 20-series (Fermi-based) products are designed for high performance
scientific and technical GPU computing.

They thus have features, testing, and support over and above our consumer
GeForce GTX 470 and 480 (Fermi-based) products such as:

Double precision is 1/2 of single precision for Tesla 20-series, whereas double precision
is 1/8th of single precision for GeForce GTX 470/480
ECC is available only on Tesla
Tesla 20-series has 2 DMA Engines (copy engines). GeForce has 1 DMA Engine. This
means that CUDA applications can overlap computation and communication on Tesla using
bi-directional communication over PCI-e.
Tesla products have larger memory on board (3GB and 6GB)
Cluster management software is only supported on Tesla products
The TCC (Tesla Compute Cluster) driver for Windows is only supported on Tesla
OEMs offer integrated workstations and servers with Tesla products only
HPC ISV software is tested, certified, and supported only on Tesla products
Tesla products are built for reliable long running computing applications and
undergo intense stress testing and burn-in. In fact, we create a margin in
memory and core clocks (by using lower clocks) to increase reliability and long life.
Tesla products are manufactured by NVIDIA and come with a 3-year warranty
Tesla customers receive enterprise support and have higher priority for CUDA bugs
and requests for enhancements
Tesla products have long availability cycles ranging from 18 to 24 months and NVIDIA
gives its customers a 6 month EOL notice before discontinuing a Tesla product.

Learn more at [url=“http://www.nvidia.com/tesla”]http://www.nvidia.com/tesla[/url]
CUDA Software Development tools are linked from : [url=“http://www.nvidia.com/object/tesla_software.html”]Page Not Found | NVIDIA

Knowledgebase entry that will kept up to date
[url=“Error | NVIDIA”]http://nvidia.custhelp.com/cgi-bin/nvidia....hp?p_faqid=2595[/url]

SPWorley · April 2, 2010, 4:28am

And, perhaps most critically for many applications, Tesla comes with much more memory than GeForce cards… 3 or 6 GB, versus the GTX480 GeForce of 1.5 GB.

sumitg · April 2, 2010, 4:34am

Thanks for reminding me of this very important feature! I updated the main post.

Sarnath · April 2, 2010, 5:25am

Sumit,

Can you provide links for “TCC” and “Cluster Manager Software”?

BEst Regards,
Sarnath

sumitg · April 2, 2010, 5:47am

All Tesla product drivers are at http://www.nvidia.com/drivers

Select Tesla 1U System → S1070 → Windows 2008 (R2) (x64) and you will come to TCC

We will release Tesla C1060 TCC drivers soon for Windows Vista and Windows 7

Cluster software links are on the SW Tools page (link in original post)

Sarnath · April 2, 2010, 9:11am

Thank you Sumit. It was useful.

Nice to see auto-parallelizing software like “Goose”, “HMPP” and the likes.

sagrailo · April 2, 2010, 9:37am

What do you mean by this? I’m regularly using Sabalcore on-demand cluster, having GTX285 attached to number of nodes, and am able to utilize these through TORQUE resource manager without any issues…

Uncle_Joe · April 2, 2010, 6:16pm

“Double precision is 1/2 of single precision for Tesla 20-series”

Is this an artificial cap to get people to buy Tesla? Not that there’s anything wrong with it, but
I question its long term effect on GPGPU adoption. If capped, a GTX 480 will have 168 Gdflops @ $3 / Gflop, compared to maybe $8 / Gflop for CPUs. If uncapped and hence $0.74 a Gflop, that will be a major attraction.

_Big_Mac · April 2, 2010, 8:39pm

We have no official word on that from NVIDIA and I doubt we’d ever hear “yes, we capped GeForces to drive Tesla sales” anyway, but there’s a good possibility this isn’t an artificial cap. It might be a legitimate way of increasing yields (which are pretty poor I hear). If some DP FPUs fail to work, don’t throw away the die, disable the duds and make it a GeForce. Gamers don’t need DP so why make them pay for hand-picked all-working chips? That would be the same motivation that made Cell a 7-core processor in PlayStation 3 while “premium” blade servers had all 8 cores active.

On the other hand, the capping theory may be true as well. One way of confirming it would be if someone made some sort of a flash-hack that’d show you can activate the missing ALUs. I don’t expect NVIDIA admitting if this were true.

Uncle_Joe · April 2, 2010, 9:01pm

I doubt they would disable ALUs because I think single precision and double precision are done in the same unit, according to

“A New Architecture For Multiple-Precision Floating-Point Multiply-Add Fused Unit Design” from 2007. The abstract says

Sharing the ALUs for float and double definitely makes sense due to the ultimate flexibility of allowing all units being used at once instead of half idle. I believe Int24 operations use the single precision FPU, since

Their throughputs are the same as float.
Float has 24 significant digits.

_Big_Mac · April 2, 2010, 9:59pm

If DP and SP is one by the same physical ALUs, the yield increase theory doesn’t seem to make sense anymore…

avidday · April 2, 2010, 10:10pm

It didn’t make much sense before, to be honest. It would be a pretty fortuitous defect distribution that would leave exactly the right number of working double precision ALUs in each MP and leave the rest of the die otherwise intact…

Working on the theory that there is currently only one Fermi die (ie. all GF100s are born equal), it seems much more likely that the strategy used with OpenGL acceleration on Quadro boards forever has now been extended to the compute APIs on Fermi, ie. if you want the full feature set, buy the professional board.

Tom_Milledge · April 2, 2010, 10:31pm

Because we learned our lesson from the GT200 and have crippled the GF100 to MAKE YOU buy the Tesla.
See answer number 1.

:rolleyes:

allanmac · April 4, 2010, 12:10am

I would think that seeing one 480 core device (Fermi) vs. two 240 core devices (Tesla–/GTX295) is a big win for many developers that aren’t investing in distributing their application across devices.

eyalhir74 · April 6, 2010, 7:01am

We currently have ~20 S1070 Tesla with 2 Teslas per server machine. Another advantage to Fermi, in addition to what you say,

is that I can cut the server count by half to acheive ~ the same computional power.

It also means more computional power per PCI slot - a limiting factor today.

All in all if Fermi delivers ~x2 the performance - I think its an exciting change…

eyal

aeronaut · April 13, 2010, 2:44am

Sumit,

OK. Here’s the $64,000 question.

Are the DP FPUs on the Fermi chip deliberately turned off or destroyed for the consumer GTX 470/480 chips, or is this a yield issue, where otherwise good Fermi chips with some faulty DP FPU units are then salvaged by putting them in the consumer gaming cards?

Or, in other words, are you all deliberately making your Fermi chips less powerful than they are, or is this a question of availability of fully functioning chips? If the former, how should someone on a computational budget most effectively spend their dollars? If the latter, can we expect improvements in the process to eliminate this issue in the future?

Regards,

Martin

aeronaut · April 14, 2010, 5:29pm

It’s possible (albeit a little farfetched) to imagine that the ALU is the fabrication weak link on the chip, and that many of the otherwise good chips have ALU fault levels varying from, say, 30-60%, so by shutting down 75% of them, all of those chips are made usable. However, I thought that each ALU is specific to a CUDA core or a group of cores, and not free floating.

But I suspect there’s more truth in your latter point, and that nVidia’s approach is “We’ve put X $$$ into CUDA development, and must sell computational cards at $Y to make that division profitable. And we’ve spent $A on development for gaming, so the gaming cards must sell at $B for profitability, given our projected sales.” (quotes mine.) I wish they would realize that HPC is a smaller market, with more cost conscious consumers, and try to make that division merely breakeven.

It’s tempting to go out and buy one share of nVidia stock, simply to get their annual reports mailed to me. (I know people do this with Berkshire Hathaway, to get the inside info, even though one share runs about $10,000. People also do this with Sam’s Club, since you can shop there if you’re a stockholder.) Perhaps then I could get more inside info. Couple it with one share of AMD to monitor the whole market.

Regards,

Martin

E.D_Riedijk · April 14, 2010, 5:56pm

As far as I understand, a DP unit is actually 2 SP units with some extra logic. So there are no separate DP units anymore as on GT200. That is why only 1 DP warp is running in 2 clock-cycles while 2 SP warps are running in 2 clock-cycles on each fermi-multiprocessor

SPWorley · April 14, 2010, 6:29pm

Since Fermi’s DP is Real Deal DP, full IEEE 754-2008, not an approximation, it’s likely more accurate to say that it’s fundamentally a DP unit with extra logic to alternatively let it do 2 SP results in parallel. I think the extra computational bits are also cleverly used to implement FMA.

aeronaut · April 15, 2010, 1:48pm

So with all this nice new tech rolled in, it’s all the more disappointing to have 3/4 of the performance capped on the consumer cards.

Martin

Topic		Replies	Views
Why Tesla? CUDA Programming and Performance	27	33702	November 20, 2008
Seek advice on latest fermis CUDA Programming and Performance	14	1882	September 1, 2011
Tesla Compute Cluster driver released non-display driver for 64-bit Windows Server 08/08 R2 CUDA Programming and Performance	37	30530	October 21, 2014
newbie questions CUDA Programming and Performance	14	1889	September 24, 2010
TESLA drivers Separate it from graphics drivers. CUDA Programming and Performance	19	4732	March 30, 2009
GTX 480 / 470 Double Precision Reduced? CUDA Programming and Performance	178	266070	October 9, 2010
More details on new Tesla w/ Fermi GPU posted CUDA Programming and Performance	37	11432	December 12, 2009
Tesla S2050 performance double precision performance too low CUDA Programming and Performance	42	29196	December 8, 2010
Disappointed performance using C2050 CUDA Programming and Performance	20	7755	September 2, 2010
GTX 580 is not as good as GTX480 for CUDA ? CUDA Programming and Performance	23	3915	November 7, 2010

Tesla 20-Series Features and Advantages

Related topics