GTX 480 / 470 Double Precision Reduced?

moozoo · March 29, 2010, 7:12am

The real question is “If I want to do heavy double precision computation on the cheap, using a standard graphics card, is a gtx480/470 my best choice?”

How much did they reduce the double precision performance?

Does this fall below that of completing products?

cho · March 29, 2010, 8:08am

1/4 tesla:

[attachment=16631:cuda_z.png]

TrekCZ · March 29, 2010, 8:20am

I think that Nvidia could be generous after all that delays and suffering and enable full dp throughput on geforce.

cbuchner1 · March 29, 2010, 9:18am

Okay, that means that applications targeted at consumer hardware such as Physics APIs in games will have to find a reasonable balance between single and double precision computations. Like only using DP where it is really unavoidable, and use single precision otherwise. Not really any different from code targeted at the GTX260 and better.

ceearem · March 29, 2010, 9:40am

Actually I can’t think of any scenario where you really would want to use DP in game physics. I mean who cares if the numerical errors of a probably very crude physics model are a bit less or not in a game, where it is only a gimmick. You would anyway use relatively simple physic models. I doubt for example that anyone would use a real MD Tip4d simulation for water in a game (considering that you usually only simulate cubics of some nm ;-).

In fact this reduced DP performance might have one good side effect, you have one more good reason to support single prec. in your code, which makes it easier to write stuff which still runs nice on the GT200 hardware, which many people and companies/universities still own.

StickGuy · March 29, 2010, 11:40am

It’s kind of a Catch-22 situation. The gaming industry won’t make use of double precision until there’s a compelling reason to do so, and keeping the DP performance crippled is a good way of stopping them from finding a compelling reason. At the same time, NVIDIA probably won’t make high performance DP available on consumer cards until there’s a strong push from the gaming industry for them to do so.

ceearem · March 29, 2010, 12:38pm

I would not be surprised if they made full DP available in a refresh of this hardware in two possible scenarios:

More consumer software starts to use GPUs at all (even without double prec) since that would give more reason to have all compute features available.
If AMD/Ati gets it act together and finally produces a software / developer infrastructure, which is reasonable good compared to the already rather mature CUDA ecosystem, they would have a very distinct advantage in DP performance, which might compell NVIDIA to unleash all of fermis capabilities on the consumer products as well.

moozoo · March 29, 2010, 12:47pm

So its as I thought. Tesla (~672 Gflops double) > 5870 (554Gflops double) > GTX 480 (168 Gflops double)

(554 Gflops double coming from http://techreport.com/articles.x/17618/5)

I do realise that these are peak rates and that Fermi architecture will let it get closer to the peak on more complex problems.

MFago · March 29, 2010, 3:41pm

Ok, peak performance on something nontrivial. If you’re doing SGEMM, then sure.

BTW, here are some SiSoft GPGPU results that show GTX 480 is roughly comparable to Radeon 5970 in single precision.

http://hothardware.com/Articles/NVIDIA-GeF…Landed/?page=15

This is backed up by games, too. NVidia is slightly faster (or about even) in games, but ATI has 3X peak performance.

Of course, it all depends on the exact application…

Matt

mikola · April 6, 2010, 11:01pm

This reduction is sad; especially, because it will be possible to buy GTX 480 in the very near future and tesla is still on the way.

Very interesting question is how particularly they reduce it? So, I can think about two possible ways
1 double precession is done on SFU
2 number of CUDA cores for double is limited

In addition it is unclear 1/4th in comparison with what?
with Tesla C2050 which now has 448 CUDA Cores @ 1.15 GHz
or with maximal capacity of multiprocessor

Any ideas?

sumitg · April 7, 2010, 4:21am

Please see the post:

http://forums.nvidia.com/index.php?showtopic=165055

Double precision on GeForce GTX 4x0 is 1/8th of single precision.

On Tesla x20x0, double precision is 1/2 of single precision.

John_P_Myers · April 7, 2010, 2:34pm

It has been brought to my attention that the GTX 400 series may have had it’s double precision floating point capabilities capped at 1/8 CUDA cores instead of the 1/2 the hardware is capable of. Can you very if this is true for us? I’m a number cruncher involved with BOINC and a member of the SETI.USA team. Nearly 2 million people worldwide crunch for BOINC, many others for other projects (such as EVGA’s own Folding team). The results of this will have a large impact on our decision making as far as upgrading GPUs and the rate at which answers are found for the various scientific projects we work on. Regardless of whether or not we have the money, we’re not going to be purchasing Tesla cards to get this performance. We do not need $2000 worth of tech support. If this issue isn’t fixed, we’ll have no choice but to focus on ATI’s offerings which outperform Nvidia’s crippled GTX series. Please don’t force our hands like this. Thank you.

-John P. Myers

cbuchner1 · April 7, 2010, 2:37pm

Can you refer us to any GPU accelerated BOINC projects that make use of double precision floating point (current or announced projects)? I am not aware of any.

John_P_Myers · April 7, 2010, 2:42pm

Milkyway@home requires double precision. In fact, it will not work on cards without a compute capability of at least 1.3

Others would exist, but have slowed their production down by writing FP32 apps to potentially attract more people. With both ATI and Nvidia offering tremendous FP64 support, this would be changed in short order.

John_P_Myers · April 7, 2010, 4:00pm

To further elaborate, SETI.USA is the #1 crunching team. Not just #1 in the country, but #1 in the world with 6.25 Billion cobblestones of computation completed (1 cobblestone = 864,000,000,000 floating point operations). The major French team (technnically the French speakers of the world) is 2nd, with the major German team 3rd. We take advancing the scientific knowledge of the world very seriously. Therefore, we take Nvidia crippling our possible production very seriously as well. We simply cannot use Nvidia 400 series GPUs if FP64 is going to be intentionally crippled when ATI’s GPUs are left wide open, and will not use Tesla. We have our own tech support. Personally, I currently and always have used Nvidia GPUs. After this though…it’s just unacceptable.

Worldwide team rankings: [url=“Home | BOINCstats/BAM!”]Home | BOINCstats/BAM!

We are adamant about upgrading our computers to the best possible hardware we can get our hands on. The competition between us and the other nations of the world is fierce and often stressful. We have to be on top of things in order to remain #1 in the world. If using the 400 series will hinder our ability to maintain the #1 spot, then we simply cannot use them. Others around the world will follow suit. Everyone wants to be #1.

We had been anticipating that the GTX 480 would outperform the ATI HD5870 in FP64 computations. If you’re going to cripple the 400 series, obviously this will not be the case and will be a big disappointment to everyone. Seems Nvidia is happy to be in 2nd place in this regard. A very distant 2nd.

We need confirmation one way or the other about the 400 series. If it is currently crippled but you’ve decided it’s going to be a horrible business move and change your mind, that’s fine. If you’re going to leave it crippled, say so. If it’s all just a rumor, say so. If you say nothing, I (we) will have no choice but to assume your products will remain inferior to all of us. We will not lose the #1 spot to the French because they went with ATI and we didn’t.

Official confirmation, please.

-John P. Myers

cbuchner1 · April 7, 2010, 4:55pm

I think no one here has a problem when you prefer ATI over nVidia in your particular use case. As for official confirmation, we’re all waiting.
I am actually rooting for the French and German teams though ;)

seibert · April 7, 2010, 4:59pm

This post was written by Sumit Gupta, who is a Senior Product Manager in the CUDA group at NVIDIA:

http://forums.nvidia.com/index.php?showtopic=165055

I think that’s pretty official.

seibert · April 7, 2010, 5:05pm

Just to check: You’re sure that you are compute bound, and not memory bandwidth bound on these applications? Is your work unit throughput proportional to the shader/core clock on the card if you scale it up and down (while leaving the memory clock fixed) with an overclocking tool?

You should definitely use whatever hardware runs your code the fastest, but there can be other limiting factors on performance besides pure FLOPS. It’s worth being certain before you pull out the wallet. :)

John_P_Myers · April 7, 2010, 5:44pm

Bandwidth is not a problem with these apps. A PCIe x16 2.0 GPU put in a PCIe x1 1.1 slot will cause no performance decrease when crunching numbers. Overclocking the shader only does increase performance directly. Bandwidth isn’t even close to being a limiting factor. Our performance is directly related to pure FLOPS. This has been a known fact for us for several years.

avidday · April 7, 2010, 5:51pm

Memory bandwidth isn’t the same the as PCI-e bandwidth, and the former was the subject of the question, not the latter.

Topic		Replies	Views
Is nvidia forcing SP compute customers into expensive cards? Why is SP Cuda so slow on gtx680? Somet CUDA Programming and Performance	49	13874	May 20, 2012
GTX 580 is not as good as GTX480 for CUDA ? CUDA Programming and Performance	23	4205	November 7, 2010
TITAN X CUDA Programming and Performance	35	10863	March 23, 2015
Tesla 20-Series Features and Advantages CUDA Programming and Performance	65	152902	December 21, 2010
Disappointed performance using C2050 CUDA Programming and Performance	20	8099	September 2, 2010
More details on new Tesla w/ Fermi GPU posted CUDA Programming and Performance	37	11927	December 12, 2009
Nvidia Pascal TITAN Xp, TITAN X, GeForce GTX 1080 Ti, GTX 1080, GTX 1070, GTX 1060, GTX 1050 & GT 1030 CUDA Programming and Performance	157	79911	September 25, 2017
Comparing C1060, GTX470, GTX480 and C2050 Benchmark results of the Fermi Cards and Tesla generation CUDA Programming and Performance	9	26042	November 4, 2010
Tesla S2050 performance double precision performance too low CUDA Programming and Performance	42	29609	December 8, 2010
Student buying card for CUDA. Which one? CUDA Programming and Performance	16	15099	December 4, 2012

GTX 480 / 470 Double Precision Reduced?

Related topics