GTX 480 / 470 Double Precision Reduced?

“Additionally as NVIDIA is clearly outgunned in the SP department, and more power hungry than the equivalents, a switch to AMD may occur.”

Why do you think so?

Btw, could you please clarify your sp and dp needs? You started with that you need more dp performance.

@iamloki: i feel your pain. An ATI HD5870 has about 3.2x more DP performance than a GTX 480. It’s very sad news. CAL (ATI’s ‘version’ of CUDA) has gained alot of ground lately and i fear is about to gain alot more.

@moozoo: doubtful. the Tesla C2070 still runs under 300W in PCIe with full DP (from projected specs, at least)

Sure, us BOINC people (all 1,938,518 of us, a couple million more crunching for non-BOINC projects) may be a different breed, but the fact remains we buy parts. We buy the best parts we can, based on performance and price. You may call us crazy or weird or whatever, but in my business, i’ll accept a crazy man’s money just as fast as anyone else’s :)

In fact, yesterday a world record was set on PrimeGrid on the AP26 (arithmetic progression of 26) subproject (http://www.primegrid.com). It was found using a PS3 (which Nvidia made a simply wonder GPU for). How many PS3’s do your stats show were slated for HPC use? Truth is, you just don’t really know.

No, PG doesn’t require DP (yet). The point is, you simply cannot accurately track just how many GTX 470/480’s would be put to HPC use. I can promise you, your numbers are way off.

“An ATI HD5870 has about 3.2x more DP performance than a GTX 480.”

What performance are you talking about? Any Fermi tests?

You make good points! For both cases, if it is a yield issue or a marketing strategy, nVidia would be reluctant to say it.

But let’s say it is the marketing strategy – I would say it is an understandable but conservative choice. I, for example, am doing my master’s thesis. Students doing research for their masters are an easy target for professors and other researchers for trying out their new but risky (risk of being not true) theories. If a student is lucky, he can get some money from the faculty, but we are not talking about big numbers. If that small amount of money buys a GTX480 that has full dp-speed, and the idea of the professor works, it will almost always lead to expanding that research. Expanding using the same hardware-vendor is a logical choice and will lead to nVidia selling more of their Tesla computers. Win-win for both parties.

Anyway, let’s just hope it is a yield-problem that will be solved in the future. In any case I must say I am not pleased that I did not hear from the downgraded dp-performance until I happened to find this thread. The marketing completely misses to point out these important design decisions and nVidia has to realize it leaves a huge group of people in a state where they become very cautious of buying their products – thinking: what else has NV up their sleeves. It is bad brand-marketing!

Well, to reply in reverse order, we know that the Tesla cards have more memory. 3 GB and 6GB for the C2050 and C2070. But I priced them yesterday, and found only one place with a listing, no stock, offering the C2050 at $2500, and the $C2070 at $3999. When you contrast this with a GTX 480 selling at something above MSRP at $650 or so now, which will hopefully fall as supply picks up, that means that you can get 6 GB by buying 4 GTX 480’s, and have an additional 1320 CUDA cores thrown in for free. It’s still more cost effective to go the GTX route, even with the cost of added computers.

It’s certainly possible that this is a power, heat, or clock issue. But it should be easy for nVidia to put out the reference computational card for just a little more money than the consumer card. Presumably, they are making money selling GTX 470s at $349 and GTX 480s at $499, or at least their internal market studies should show that they are, or they were stupid to design to that price point at a loss.

And if their argument is that gamers don’t use DP math, so crippling it isn’t an issue, they could just uncripple it and it wouldn’t add to the power/heat budget because it isn’t getting used.

Bottom line. There are added features that the HPC crowd can use. But we’re not monolithic. Here’s a partial list:

Double precision floating point arithmetic - engineers often don’t need this, research scientists usually do.

Error correcting code (ECC) - important for very large installations, not so much for single machines.

Increased memory - totally dependent on application. I’m sure there’s some simulations that would be best served by a GT 240 with 12 GB. Others would be happiest with a GTX 480 and 256MB. (Neither of these cards exist, of course.) But there’s a spectrum here.

It seems to me that the logical solution is for nVidia to allow other manufacturers (XFX, Asus, eVGA, Palit, Sparkle, PNY, Gigabyte, etc.) to make the consumer cards, and for nVidia themselves to make a line for research (or perhaps let a few manufacturers make specific computational cards.) Kind of what they are doing now, but with more variety. But they should consider how they can hit good price points for each market, give the HPC crowd more options than just a gaming card or a Tesla, and make the pricing fair.

Even after all that, the fellow from SETI@home still has a legitimate beef in that 3/4 of the consumer DP capacity his group could tap into is being trashed by nVidia’s marketing “strategy”.

Unfortunately, if nVidia listened to everyone in the HPC community, they would put out 200-300 different models of computing cards. They should make smart decisions and try to satisfy as many people as they can. One HUGE step in that direction would be to release a GTX 480 special edition computation card. More memory, slightly slower clock speeds, more limited video output, but no ECC, and full DP math. I’d pay a little more for that card, especially if it solved my problems better than a consumer card, and I expect that a lot of other people would do the same. The production cost shouldn’t be that much more than for a standard consumer card.

I’m planning on getting a linux box specifically to do calculations sometime in the next 2-4 months. My budget is $800-$1300, and I need DP math, but not ECC, and could use a fair amount of memory, but I’m not sure just how much. At the current pricing levels, the Tesla cards are simply out of the question. My best hope, now, is that over time the GTX 470 and 480 drive the price of the GTX285 down enough so that I can get a couple of them with 2GB memory each. That would be a much better solution for me than a GTX 470 with reduced DP math speeds.

I’m hopeful that this is much ado about very little, and that the full DP math for GTX 470 and 480 cards is just around the corner. If they do it for the GTX 285 and GTX 295, I don’t see why they can’t do it for the GTX 400 cards. Won’t seems more likely than can’t to me right now.

Regards,

Martin

If it’s a yield issue, hopefully nVIdia would come out and say “Because of manufacturing issues, we have reduced the DP performance on these early 400 series cards. We hope to resolve this going forward, and have full DP available soon on all cards.” That would make almost everyone here feel much better, and the gamers who just had to be early adopters wouldn’t care that much about reduced DP performance.

But if it’s a marketing strategy, they would put out a statement almost exactly like the one they put out, in this forum, which you can read here:

http://forums.nvidia.com/index.php?showtopic=165055

The rub is that nVidia is targeting two audiences now. Gamers, and people with really deep pockets building superclusters. I suspect most of the HPC community falls in camp 3, consisting of individuals and small research groups with limited budgets and high computational needs. The way to appeal to camp 3 is to be the value leader in the field. Computations can be done on PCs, clusters, dedicated supercomputers, dedicated superclusters, GPUs, GPU clusters, and so forth. We in camp 3 are simply looking for the best bang for the buck. If there was some super abacus run by trained teenagers in a 3rd world country (with no exploitation or abuse,) we’d all use that if it crunched the most numbers for the buck. And while our pockets aren’t as deep as the supercluster crowds, there’s more of us. Not sure which group has more money, but I don’t think we should be ignored.

Regards,

Martin

aeronaut

Did you noticed that Fermi was reschedulled? Most likly because of production problems. Technical process on that fab is not perfect yet. DP reducing is probably manufacture demand. Anyway “professional” hardware has another price range. DP is sure for professional use only. Let NV to make some money at last. They invest most of its profit in development.

Maybe the people who are thinking of not buying a gtx470 because the peak double precision power is 1/4 of the hardware’s true potential should first be looking into their algorithm to see if it is bandwidth bound (on double precision). Testing of some beta product on C1060 showed that the performance gain of some computation vs CPU in single precision was 7 times. The same computation in double precision also had a performance gain of 7 times.
So, on current hardware that has the same SP FLOPS / DP FLOPS ratio as a GTX470, the actual performance gain was no different because the GB/s throughput is actually the limiting factor (the achieved GFLOP/s differed by exactly a number of 2)

well as far as I know ATI cards doing “trivial” operations are genuinely much faster in both SP and DP.
In astrophysics cluster dynamics simulations the majority of operations are “trivial”. The n body problem really does not require advanced features. Benchmarks will have to be done before a judgement can be made as the effect of cache is an unknown.

Also as researchers it is common practice to profile code. This has been done. We are DP compute bound. In either case I want to see fermi/gtx480 benchmarks on FFT. (sp and dp, real and imaginary)

p.s I’m writing this on my phone. I would refer people to benchmarks backing my claim but since they are easy to find, and my phone sucks at multitasking I leave it to those inclined to look it up.

Are you aware that there are no any dp functions in ATI still? Just mul and add. You need to program even division. And somewhere seek square root. And it could use officially only 256MB memory in opencl. And it is hard to use more than one gpu in a system. Because of thier driver does not properl support multi-gpu and multithreading. And so on.

Are you aware that there are no any dp functions in ATI still? Just mul and add. You need to program even division. And somewhere seek square root. And it could use officially only 256MB memory in opencl. And it is hard to use more than one gpu in a system. Because of thier driver does not properl support multi-gpu and multithreading. And so on.

I am aware that openCL has limitations with AMD, but I am also not affraid of low level code.
Astrophysics mostly can work with mul and add, due to its simplicity theoretical peak performance like levels could be attained fairly simply.

Anyway, clarrifying our needs. The majority of work in astro requires DP. However there are places where SP can be used in place of DP as an optimization.
DP however is unavoidable in significant chunks of computing. Of course we would prefer to have the option of doing everything in DP.

Another point to make. I am not an AMD fanboy. Nor an NVIDIA one. My arguments are objective. Nvidia still holds the advantage of a strong code base with a good community.
Also elements of the FERMI architecture are outstanding and extremely interesting. For diehard performance/$$$ people, however interesting might not cut it. Some of us who are not affraid to do low level code, might build another community for AMD if it can be demonstrated that the development cost outweighs the potential gains.

I still think Nvidia is being silly by cutting DP on consumer cards. As to what cards will be bought by us in the end, its a tough call. Several months ago, it was a forgone conclusion that FERMI GTX480 would be the one.
It all depends on hard objective analysis from here. Both platforms will be benched and compared for various uses.

I am aware that openCL has limitations with AMD, but I am also not affraid of low level code.
Astrophysics mostly can work with mul and add, due to its simplicity theoretical peak performance like levels could be attained fairly simply.

Anyway, clarrifying our needs. The majority of work in astro requires DP. However there are places where SP can be used in place of DP as an optimization.
DP however is unavoidable in significant chunks of computing. Of course we would prefer to have the option of doing everything in DP.

Another point to make. I am not an AMD fanboy. Nor an NVIDIA one. My arguments are objective. Nvidia still holds the advantage of a strong code base with a good community.
Also elements of the FERMI architecture are outstanding and extremely interesting. For diehard performance/$$$ people, however interesting might not cut it. Some of us who are not affraid to do low level code, might build another community for AMD if it can be demonstrated that the development cost outweighs the potential gains.

I still think Nvidia is being silly by cutting DP on consumer cards. As to what cards will be bought by us in the end, its a tough call. Several months ago, it was a forgone conclusion that FERMI GTX480 would be the one.
It all depends on hard objective analysis from here. Both platforms will be benched and compared for various uses.

Yes, I’m aware of that. I also don’t know what the MSRP is supposed to be - I’d guess that’s a better figure to use than some online vendor’s pre-sale price on vapor hardware. Anyone know what the C2050 is supposed to run for?

I have no problem paying a little more for a computational card. But I do have a problem paying for stuff I don’t need, like ECC, cluster management software for windows (when I’m running a single linux box,) and all the other “value” goodies that are mentioned in:

http://forums.nvidia.com/index.php?showtop…p;#entry1040192

DP is essential, more memory is nice, the rest is added cost that I can’t use.

My suspicion is that if DP is really reduced on the GTX 400 cards, HPC types who want best performance per $$$ will start buying up all the higher end GTX 200 series cards they can instead of buying the new cards. With the high end gamers selling their GTX 285 and 295 cards to get the 400 series, plenty should be available. And nVidia will miss out on some new sales to people who could advocate for them really well.

I’d happily pay something extra for a non-overclocked or slightly underclocked card with full DP math and extra memory. But the markup on Teslas puts them well outside my budget.

Regards,

Martin

Yes, I’m aware of that. I also don’t know what the MSRP is supposed to be - I’d guess that’s a better figure to use than some online vendor’s pre-sale price on vapor hardware. Anyone know what the C2050 is supposed to run for?

I have no problem paying a little more for a computational card. But I do have a problem paying for stuff I don’t need, like ECC, cluster management software for windows (when I’m running a single linux box,) and all the other “value” goodies that are mentioned in:

http://forums.nvidia.com/index.php?showtop…p;#entry1040192

DP is essential, more memory is nice, the rest is added cost that I can’t use.

My suspicion is that if DP is really reduced on the GTX 400 cards, HPC types who want best performance per $$$ will start buying up all the higher end GTX 200 series cards they can instead of buying the new cards. With the high end gamers selling their GTX 285 and 295 cards to get the 400 series, plenty should be available. And nVidia will miss out on some new sales to people who could advocate for them really well.

I’d happily pay something extra for a non-overclocked or slightly underclocked card with full DP math and extra memory. But the markup on Teslas puts them well outside my budget.

Regards,

Martin

A $350 GTX285 has a DP throughput of 88 GFLOPS, and the $350 GTX470 has a DP throughput of 138 GFLOPS.

A $350 GTX285 has a DP throughput of 88 GFLOPS, and the $350 GTX470 has a DP throughput of 138 GFLOPS.

So, you either just get the consumer card, and accept the capped performance (still improved over the G200 generation), or you buy the ATi card instead. I don’t see the issue here.

It is all very annoying, but the fact is (as pointed out above) all companies do this routinely. Complaints and threats to depart the platform on this forum are very unlikely to cut much ice at the level this decision has been made. A massive defection from nVidia to ATi most certainly will, so prove you can migrate away, and nVidia will be more interested in keeping your business.

So, you either just get the consumer card, and accept the capped performance (still improved over the G200 generation), or you buy the ATi card instead. I don’t see the issue here.

It is all very annoying, but the fact is (as pointed out above) all companies do this routinely. Complaints and threats to depart the platform on this forum are very unlikely to cut much ice at the level this decision has been made. A massive defection from nVidia to ATi most certainly will, so prove you can migrate away, and nVidia will be more interested in keeping your business.

OK. It’s a little better. But the DP performance ought to be around 500 GFLOPS. It’s like buying a Ferrari and taking out 75% of the spark plugs. Nice car, but crippled. Still faster than a Chevy Malibu.

Further, I doubt that the GTX 285 will stay at $350 after there’s a solid supply chain of GTX 470s available.

Regards,

Martin