Double Precision on all new Fermis getting to the bottom of DP performance, esp mobile

MacFan · October 5, 2010, 1:17pm

I am thinking of buying a high end laptop for CUDA work and really need to know the level of DP support. This has always been un-necessarily vague, IMHO. What I understand for desktop GPUs that both Tesla 20xx and Quadro 4000, 5000, 6000 have about half as many DP units as SP, whereas the Q2000, 600 and all the GTX cards have considerably less DP capability. I reckon the GTX 480 has about 1/4 the DP capability as the Fermi, for example, but you also have to factor in clock speed. I saw one site suggesting that for the Q2000 the factor is down to 1/6 (is this right??)

This has been pieced together from various sources. What would be really nice is if Nvidia could include the DP:SP levels on all the web pages for ALL the new GPUs. Given that with Fermi there is massively more differentiation in DP support between the Pro and Games kit, those of us that have figured out that to do proper Sci Comp we cannot now get away with the games cards any more are at least owed a but more information.

What I need right now is corresponding information for the mobile devices, in particular the Quadro 5000M and GTX 480m, 460m. I guess the main question is whether it is worth shelling out for a 5000m unit from maybe HP, but I would only do that if it has the same 2:1 DP support as the desktop Quadro/Tesla.

So who can tell me for all the mobile GF10x cards, what the DP:SP ratio is, assuming the same clock speed?

MacFan · October 5, 2010, 1:17pm

I am thinking of buying a high end laptop for CUDA work and really need to know the level of DP support. This has always been un-necessarily vague, IMHO. What I understand for desktop GPUs that both Tesla 20xx and Quadro 4000, 5000, 6000 have about half as many DP units as SP, whereas the Q2000, 600 and all the GTX cards have considerably less DP capability. I reckon the GTX 480 has about 1/4 the DP capability as the Fermi, for example, but you also have to factor in clock speed. I saw one site suggesting that for the Q2000 the factor is down to 1/6 (is this right??)

This has been pieced together from various sources. What would be really nice is if Nvidia could include the DP:SP levels on all the web pages for ALL the new GPUs. Given that with Fermi there is massively more differentiation in DP support between the Pro and Games kit, those of us that have figured out that to do proper Sci Comp we cannot now get away with the games cards any more are at least owed a but more information.

What I need right now is corresponding information for the mobile devices, in particular the Quadro 5000M and GTX 480m, 460m. I guess the main question is whether it is worth shelling out for a 5000m unit from maybe HP, but I would only do that if it has the same 2:1 DP support as the desktop Quadro/Tesla.

So who can tell me for all the mobile GF10x cards, what the DP:SP ratio is, assuming the same clock speed?

Lev · October 5, 2010, 1:25pm

Do not confuse maximum theoretical speed and practical performance.

Lev · October 5, 2010, 1:25pm

Do not confuse maximum theoretical speed and practical performance.

Smokey · October 5, 2010, 11:25pm

All Fermi cards have roughly 2 SP:1 DP performance ratios, DP is implemented in Fermi by using two SP units together (instead of using completely separate units for DP, to share die space, and presumably reduce gate contention while using both SP/DP at the same time)

The biggest thing to keep in mind are the limits imposed on consumer (GeForce) cards vs. Quadro/Tesla, where nVidia intentionally limit the DP throughput on consumer cards to be roughly (correct me of I’m wrong, this is purely from memory) 1/4th of Quadro/Tesla cards, for financial reasons (no one would buy Tesla/Quadro if you could get identical performance from a GeForce card for 1/10th the price).

Not sure if this is a software limit imposed by the driver, or they wire up the chip a specific way for GeForce cards to limit the capabilities - either way expect 1/4th DP performance on consumer cards to professional cards, and the ‘maximum theoretical’ DP speed to be 1/2 that of SP for professional cards.

Again, all from memory - I didn’t reference anything - so feel free to correct any errors here.

Smokey · October 5, 2010, 11:25pm

All Fermi cards have roughly 2 SP:1 DP performance ratios, DP is implemented in Fermi by using two SP units together (instead of using completely separate units for DP, to share die space, and presumably reduce gate contention while using both SP/DP at the same time)

The biggest thing to keep in mind are the limits imposed on consumer (GeForce) cards vs. Quadro/Tesla, where nVidia intentionally limit the DP throughput on consumer cards to be roughly (correct me of I’m wrong, this is purely from memory) 1/4th of Quadro/Tesla cards, for financial reasons (no one would buy Tesla/Quadro if you could get identical performance from a GeForce card for 1/10th the price).

Not sure if this is a software limit imposed by the driver, or they wire up the chip a specific way for GeForce cards to limit the capabilities - either way expect 1/4th DP performance on consumer cards to professional cards, and the ‘maximum theoretical’ DP speed to be 1/2 that of SP for professional cards.

Again, all from memory - I didn’t reference anything - so feel free to correct any errors here.

MacFan · October 7, 2010, 12:43pm

That more or less coincides with what I know, but my question was more about the mobile chips, and in particular whether the 5000M was in fact NOT crippled like the GTX cards, thereby getting the 2:1 maximum DP factor.

In fact on some non-laptop DP work I had found the GTX480 to be about twice as fast as the 285, so that even with the crippling it was pretty good - I have not tried that same code on Tesla or Quadro yet and have high hopes.

MacFan · October 7, 2010, 12:43pm

That more or less coincides with what I know, but my question was more about the mobile chips, and in particular whether the 5000M was in fact NOT crippled like the GTX cards, thereby getting the 2:1 maximum DP factor.

In fact on some non-laptop DP work I had found the GTX480 to be about twice as fast as the 285, so that even with the crippling it was pretty good - I have not tried that same code on Tesla or Quadro yet and have high hopes.

Smokey · October 11, 2010, 1:43am

I’m not sure anyone’s really tested mobile chips… all the info I’ve gathered has been done by people testing their own Fermi GPUs. nVidia hasn’t officially said anything on the topic that I’m aware of? So there’s no official reference here.

I wouldn’t assume mobile chips would have similar DP performance to Tesla/Quadro cards intentionally (GeForce is GeForce, mobile or not I think nVidia would want to limit the DP, else there’s no appeal to their mobile Quadros), but it’s certainly possible they’ve made a manufacturing mistake somewhere in the mobile segment which doesn’t limit the GeForce cards like they probably intended?

Smokey · October 11, 2010, 1:43am

I’m not sure anyone’s really tested mobile chips… all the info I’ve gathered has been done by people testing their own Fermi GPUs. nVidia hasn’t officially said anything on the topic that I’m aware of? So there’s no official reference here.

I wouldn’t assume mobile chips would have similar DP performance to Tesla/Quadro cards intentionally (GeForce is GeForce, mobile or not I think nVidia would want to limit the DP, else there’s no appeal to their mobile Quadros), but it’s certainly possible they’ve made a manufacturing mistake somewhere in the mobile segment which doesn’t limit the GeForce cards like they probably intended?

Sarnath · October 11, 2010, 3:27am

Mobile laptops with CUDA cards can burn your lap…, may be apart from your wallet…

Better check projects like “rCUDA” and other similies (like virtgpus…) which can make a remote GPU (possibly over VPN) look like “local” one. So, you probably would need a fast internet connection to prototype on GPUs. (more or less – sounds like a cloud)

Conceptually it is simple: If you can allocate memory , copy data in and out and launch kernels – it hardly matters whether the GPU is across PCIe or Ethernet.

Sarnath · October 11, 2010, 3:27am

Mobile laptops with CUDA cards can burn your lap…, may be apart from your wallet…

Better check projects like “rCUDA” and other similies (like virtgpus…) which can make a remote GPU (possibly over VPN) look like “local” one. So, you probably would need a fast internet connection to prototype on GPUs. (more or less – sounds like a cloud)

Conceptually it is simple: If you can allocate memory , copy data in and out and launch kernels – it hardly matters whether the GPU is across PCIe or Ethernet.

MacFan · October 11, 2010, 7:53am

I am happy to use a remote GPU where I can reliably access it, but have the problem that I am increasingly going around doing demos where I have no control over the environment or networking, so a stand alone portable capability would be much better. I have been doing fine with SP work with a 15in laptop made by UK specialist Kobalt, with a 285m in it, but am looking for DP support the next time around, with the best DP perf I can find. I can get a 460m from them (or a 480m in a 17in) but would really like to know how the 5000m compares to the 460 and 480 on the DP side, and a bit frustrated that the documentation is so obscure on this point. It would indeed be nice to fall upon a mobile chip in the games pile where Nvidia forgot to disable the 2:1 aspects, but I suspect that having been to many meetings with umpteen presentations based on non-TESLA class with the 200 series, that that lesson has been learnt!

MacFan · October 11, 2010, 7:53am

I am happy to use a remote GPU where I can reliably access it, but have the problem that I am increasingly going around doing demos where I have no control over the environment or networking, so a stand alone portable capability would be much better. I have been doing fine with SP work with a 15in laptop made by UK specialist Kobalt, with a 285m in it, but am looking for DP support the next time around, with the best DP perf I can find. I can get a 460m from them (or a 480m in a 17in) but would really like to know how the 5000m compares to the 460 and 480 on the DP side, and a bit frustrated that the documentation is so obscure on this point. It would indeed be nice to fall upon a mobile chip in the games pile where Nvidia forgot to disable the 2:1 aspects, but I suspect that having been to many meetings with umpteen presentations based on non-TESLA class with the 200 series, that that lesson has been learnt!

Topic		Replies	Views
Fermi development on a laptop and the GTX 480M Experiences developing for Fermi on laptops CUDA Programming and Performance	42	23646	November 12, 2010
double precision on mobile GPU CUDA Programming and Performance	17	7975	October 30, 2011
Tesla 20-Series Features and Advantages CUDA Programming and Performance	65	151999	December 21, 2010
GTX480 NDA expired.. reviews everywhere! CUDA Programming and Performance	16	7133	March 29, 2010
Double precision: GTX 465, GTX 480 and C2050 CUDA Programming and Performance	16	3768	September 9, 2010
GTX 480 / 470 Double Precision Reduced? CUDA Programming and Performance	178	265898	October 9, 2010
Fermi? Sounds interesting... CUDA Programming and Performance	58	15508	October 18, 2009
Student buying card for CUDA. Which one? CUDA Programming and Performance	16	14861	December 4, 2012
Need help to choose either the gtx 295 or the gtx 480 for massive Lattice Boltzman simulations CUDA Programming and Performance	10	1304	December 9, 2010
Which compute capabiility does nvs 5100m support? CUDA Programming and Performance	11	2513	October 12, 2010

Double Precision on all new Fermis getting to the bottom of DP performance, esp mobile

Related topics