GTX 480 / 470 Double Precision Reduced?

ceearem · March 27, 2010, 7:23pm

Hi I know that question popped up before, but I hope if we ask enough someone official from NVIDIA will tell us how it is ;) .

Is the double precision performance of the consumer fermi cards reduced (by 75%) compared to that of the Tesla Line?

Best regards
Ceearem

cbuchner1 · March 27, 2010, 9:07pm

And if so, could it be re-enabled by flipping a bit in the driver? ;)

TrekCZ · March 28, 2010, 12:44pm

It seems that gaming cards such as GTX480 do not have increased dp performance. Are you sure that nvidia advertised such a feature in GTX480?

ceearem · March 28, 2010, 12:54pm

The point is I found no official statement that the gaming cards have reduced dp performance - they use the fermi chip, the same as the Tesla and the Quadro Cards will use. And the architecture of the fermi chip allows for double precision performance at half speed of single precision. So it is never a question of whether the tesla cards have increased DP performance but if the consumer cards have a reduced one. If they have less DP Performance than the Tesla cards, this will be because its either disabled through drivers or some kind of hardware jumper. So the question still remains if NVIDIA decided to “criple” the DP performance on the consumer cards in order to have one more advantage for the Tesla cards besides much more memory, higher reliability [I assume here that NVIDIA “hand picks” the chips for their professional cards and tests them more throughly than the chips for consumer cards] and ecc support.

Soooo … tmurray any comment?? External Media

Best regards

Ceearem

llukas · March 28, 2010, 2:40pm

I’d like to know it too.

Exitios · March 28, 2010, 2:54pm

The following answer I received a few months ago from James Wang, a technical marketing analyst from NVIDIA:

Q: In the GeForce family, double-precision throughput has been reduced to 25% of the full design. Was this decision made to discourage the use of these products for professional use (where Quadro and Tesla are targeted?) Considering the fused support of single- and double-precision calculations in the CUDA cores, how was this change even applied?

A: Yes, full-speed double precision performance is a feature we reserve for our professional customers. Consumer applications have little use for double precision, so this does not really affect GeForce users. Having differentiated features and pricing is actually fairer for all. Given the option of enabling all professional features on GeForce and having gamers pay for them, or disabling them on GeForce and offering a more compelling price, we feel the latter is the better choice.

Regarding the second part of the question, the architecture is designed to support this kind of configuration.

seibert · March 28, 2010, 3:59pm

Argh, too bad. At least now there is a significant feature to drive individual Tesla sales aside from memory size (and ECC). I never saw any compelling reason to put a C1060 into a developer workstation unless you needed 4 GB of memory.

My code continues to avoid double precision (mostly because development started on compute 1.0 devices), and it looks like it will be profitable to continue that trend when possible, if only to target GeForce cards.

pawel_astro · March 28, 2010, 4:34pm

oh shiit, I missed that part of their ‘strategy’. i hate them for that :-(, and especially for lying about the supposed additional costs of NOT disabling DP, for which the gamers would have to pay.

still, for the list price of c2060 you can have ~5 gtx480 (with 1/4 DP and 1/2 memory) so even for pure dp performance it doesn’t necessarily make sense to buy the overpriced C & S products.

the only serious reason would be if the air-cooled gtx cards fail [more than Teslas]. do they???

I’m counting on a clever hack of a driver, some day by someone.

On a separate note, if someone needs support for a summer job dealing with toptimization of gpu drivers… :-)

ceearem · March 28, 2010, 5:20pm

True but the card price is not the only thing, since you need the pc as well where you put it in.

Thats a point I would guess is true. First of all I would definitely think that the professional cards (Tesla and Quadro) hae “hand picked” chips, and I guess they are better tested.

I wouldn’t count on that, since this could probably be implemented by an “hardware jumper”. For the newer quadro cards this is the way they made sure that you cannot use the quadro drivers (with much better performance in CAT etc.) with the consumer products.

pawel_astro · March 28, 2010, 5:27pm

I think the box with cpu etc. inside is ~$2k, so the multiple cards inside are by far more expensive (5+ times more for 4-card node).

seibert · March 28, 2010, 5:40pm

Well, I don’t begrudge them for trying to make CUDA sustainable with non-gamer income. The gamer market is running out of steam to fund the R&D for better GPU Computing features, and there are not enough other consumer-aimed compute heavy tasks to pick up the slack. (I would argue the reception of the GTX 480/470 by the review sites is lukewarm for this reason.) The HPC community is much smaller, so you have to extract more $$$ per card to maintain the same income. If this is what it takes to keep CUDA alive, so be it. (Of course, I’d love for double precision to become a must-have feature for GeForce customers. Whoever can release those applications does all of us a favor.)

Tesla cards tend to run at lower clock rates than the top-of-the-line GeForce, probably in part for this reason. However, even if GeForce is less reliable, you would need the failure rate to be 5x the Tesla for that to be cost effective in a workstation where you don’t need extremely high uptime.

nnunn · March 28, 2010, 7:30pm

Still getting good mileage with float here. And must admit, getting a bit agitated by numbers

like these: folding on GTX 480. Those last two graphs, ray tracing and folding, for real? External Media

seibert · March 28, 2010, 9:49pm

Such is the magic of an L2 cache when your working set of data (or some part of it) can fit inside the cache.

ceearem · March 28, 2010, 10:02pm

Not to forget the random access problem in main memory, and or atomic functions.

moozoo · March 28, 2010, 11:22pm

So on highly streamed non-branching double percision code which is faster, Fermi or 5xxx?

Will we see benchmark results showing Tesla>5xxx>gtx480 for double percision GFlops?

Its a pity Anandtech didn’t include some double percision compute benchmarks both of raw performance and on more complex problems.

wanderine · March 29, 2010, 1:52am

I read in one of the reviews that RD 5870 has about 2700 Gflops computational performance in single precision, while GTX 480 has about 1300, if that was true, would’nt 5870 beat the crap out of GTX 480 in every game?

MFago · March 29, 2010, 4:54am

Yes, except either:

ATI’s drivers are poor
It is very difficult to get anywhere near peak performance from the Evergreen/Cypress architecture

I’d bet on #2. Looking at the architecture, it seems that Cypress is designed explicitly for graphics (hence the 4-way VLIW execution units). And yet, even for games, Cypress is about even with Fermi. However, seemingly more efficient ($ and Watt) for graphics vs Fermi.

Hard to tell though because ATI/AMD has no decent documentation.

Matt

MFago · March 29, 2010, 5:01am

The following answer I received a few months ago from James Wang, a technical marketing analyst from NVIDIA:

Q: In the GeForce family, double-precision throughput has been reduced to 25% of the full design. Was this decision made to discourage the use of these products for professional use (where Quadro and Tesla are targeted?) Considering the fused support of single- and double-precision calculations in the CUDA cores, how was this change even applied?

A: Yes, full-speed double precision performance is a feature we reserve for our professional customers. Consumer applications have little use for double precision, so this does not really affect GeForce users. Having differentiated features and pricing is actually fairer for all. Given the option of enabling all professional features on GeForce and having gamers pay for them, or disabling them on GeForce and offering a more compelling price, we feel the latter is the better choice.

Regarding the second part of the question, the architecture is designed to support this kind of configuration.

Where is this in writing, other than a forum post?! After pushing how great Fermi would be for CUDA, NVidia needs to be honest about the capabilities of the consumer cards. I’m not overly upset by the decision (nor surprised), but this needs to be clear.

Nikolai · March 29, 2010, 5:14am

I think you can achieve peak flops on ATI, it’s the NVIDIA cards that you can’t achieve peak

i found this on a *modified SGEMM for ATI
[url=“http://forums.amd.com/forum/messageview.cfm?catid=390&threadid=127963&enterthread=y”]http://forums.amd.com/forum/messageview.cf...p;enterthread=y[/url]

on 4000 series they get 1Tflop, NVIDIA gets ~375Gflops

ceearem · March 29, 2010, 7:11am

But if I read the concerning threads correct this is a very specific example, and you don’t even compare apples with apples since in the Ati example the matrizes are in special order.

Also peak performance in some examples is actually not that important from my point of view. A very important question is how easy it is to programm, and how much effort you need to do to get close to it.

Here a list for Folding@home

http://www.pcgameshardware.de/aid,667155/F…74&vollbild

And here recent benchmark in OpenCL during the GTX480/470 tests at anandtech:

http://www.anandtech.com/video/showdoc.aspx?i=3783&p=6

So while the ATI cards are in theory much faster than the NVIDIA cards, I think its harder to write effective code for them than for NVIDIAs GPUs.

Best regards

Ceearem

Topic		Replies	Views
Is nvidia forcing SP compute customers into expensive cards? Why is SP Cuda so slow on gtx680? Somet CUDA Programming and Performance	49	13269	May 20, 2012
GTX 580 is not as good as GTX480 for CUDA ? CUDA Programming and Performance	23	3915	November 7, 2010
TITAN X CUDA Programming and Performance	35	10444	March 23, 2015
Tesla 20-Series Features and Advantages CUDA Programming and Performance	65	152025	December 21, 2010
Disappointed performance using C2050 CUDA Programming and Performance	20	7755	September 2, 2010
More details on new Tesla w/ Fermi GPU posted CUDA Programming and Performance	37	11433	December 12, 2009
Nvidia Pascal TITAN Xp, TITAN X, GeForce GTX 1080 Ti, GTX 1080, GTX 1070, GTX 1060, GTX 1050 & GT 1030 CUDA Programming and Performance	157	77656	September 25, 2017
Comparing C1060, GTX470, GTX480 and C2050 Benchmark results of the Fermi Cards and Tesla generation CUDA Programming and Performance	9	25903	November 4, 2010
Tesla S2050 performance double precision performance too low CUDA Programming and Performance	42	29197	December 8, 2010
Student buying card for CUDA. Which one? CUDA Programming and Performance	16	14900	December 4, 2012

GTX 480 / 470 Double Precision Reduced?

Related topics