Why is Tesla expensive?

I was told the Tesla 1060 board I use at work cost ~$2000. As far as I can tell it has the exact same chip as the GTX 280 ($400), minus the display hardware and 4x more RAM.

So, why the price difference? To what extent is it because Tesla chips are taken from the highest quality bin and to what extent is it price discrimination? NVIDIA, employees, don’t
be afraid to answer - I don’t have anything against price discrimination. If people want to use the GPU for high stakes applications where you don’t want errors, then they can afford to pay more.

Does GTX 280 have double precision (64 bit) support?

Yes.

To answer the question, I think you got most of it. The GPU silicon is the same, but you get the top bin, underclocked, with more memory, on a more expensive board with top shelf DC-DC components, and more QA on the final product. Apart from the extra memory, what you are paying for is the additional “peace-of-mind” that the additional QA brings.

Well, until I see the reliability #s, I’m not confident about the claim.

Can’t you achieve the same results by underclocking a GeForce? Earlier I thought without ECC RAM, a more reliable processor wouldn’t help. But I guess GPU data is quite temporary (upload results, compute, download), so bit flips shouldn’t be a problem.

You could probably approximate it, if you can get the memory clock in the right range (which is what is most heavily underclocked in the Telsa 10 series cards). I am guessing much of the top bin/underclocking stuff is to keep the silicon in very tight voltage/TDP ranges, which is pretty critical for cluster and machine room applications, and especially for the S1070 cards, which are passively cooled.

Having said that, we run stock clocked 2Gb GTX 285s in our whitebox cluster without problems (at least thus far).

“Cash cow?”

And to stick with a dairy farm metaphor: “Milking the customers with the biggest udders?” ;)

You might as well ask the same question refering to Quadro cards. Here it’s much the same hardware and the Quadro drivers specifically check for the graphics card’s device ID - even with additional checks making sure that simply BIOS reflashing of the card won’t enable the professional driver features.

A more optimistic saying would be a rephrase of what I said earlier,

“If you can’t afford errors, then you can afford error free hardware”

along the lines of “If you’re old enough to die for your country, you’re old enough to vote” (slogan for US 26th amendment)

Anyone have any reliability #s (e.g. MTBF, memory errors/bit/hour?)

Yeah, many have asked for this, but nothing has been published, leading me to believe that either:

a] These numbers don’t exist. (hard to believe, but possible if you don’t have a large deployment in the field to monitor)

b] These numbers require an NDA to get.

I’m leaning toward [b], but would welcome some clarification.

Me too, I am getting more and more requests for these numbers from within our company.

Having said that: we cannot sell anything other than Tesla to our customers, they would not accept a gamers-card.

++a

I know of a few “gamer” clusters, and the codes I use run just as well as on “professional” cards. Sometimes a job dies, but in 99.9% of all cases it’s not the GPU (if a job goes rogue, then I can almost always pin it down to DRAM). The point is that these days, no one really sets up both a “gaming” cluster and a “professional” cluster and compares. But why bother? The S1070s are a must for anything large-scale. I’ve been happy with my “gaming” cluster for three years now, the GeForce 8800s just run fine.

The story might be different for SMP’s where you really have a choice. The FASTRA-2 folks maxed it out: http://fastra2.ua.ac.be/