I am thinking of buying a high end laptop for CUDA work and really need to know the level of DP support. This has always been un-necessarily vague, IMHO. What I understand for desktop GPUs that both Tesla 20xx and Quadro 4000, 5000, 6000 have about half as many DP units as SP, whereas the Q2000, 600 and all the GTX cards have considerably less DP capability. I reckon the GTX 480 has about 1/4 the DP capability as the Fermi, for example, but you also have to factor in clock speed. I saw one site suggesting that for the Q2000 the factor is down to 1/6 (is this right??)
This has been pieced together from various sources. What would be really nice is if Nvidia could include the DP:SP levels on all the web pages for ALL the new GPUs. Given that with Fermi there is massively more differentiation in DP support between the Pro and Games kit, those of us that have figured out that to do proper Sci Comp we cannot now get away with the games cards any more are at least owed a but more information.
What I need right now is corresponding information for the mobile devices, in particular the Quadro 5000M and GTX 480m, 460m. I guess the main question is whether it is worth shelling out for a 5000m unit from maybe HP, but I would only do that if it has the same 2:1 DP support as the desktop Quadro/Tesla.
So who can tell me for all the mobile GF10x cards, what the DP:SP ratio is, assuming the same clock speed?
I am thinking of buying a high end laptop for CUDA work and really need to know the level of DP support. This has always been un-necessarily vague, IMHO. What I understand for desktop GPUs that both Tesla 20xx and Quadro 4000, 5000, 6000 have about half as many DP units as SP, whereas the Q2000, 600 and all the GTX cards have considerably less DP capability. I reckon the GTX 480 has about 1/4 the DP capability as the Fermi, for example, but you also have to factor in clock speed. I saw one site suggesting that for the Q2000 the factor is down to 1/6 (is this right??)
This has been pieced together from various sources. What would be really nice is if Nvidia could include the DP:SP levels on all the web pages for ALL the new GPUs. Given that with Fermi there is massively more differentiation in DP support between the Pro and Games kit, those of us that have figured out that to do proper Sci Comp we cannot now get away with the games cards any more are at least owed a but more information.
What I need right now is corresponding information for the mobile devices, in particular the Quadro 5000M and GTX 480m, 460m. I guess the main question is whether it is worth shelling out for a 5000m unit from maybe HP, but I would only do that if it has the same 2:1 DP support as the desktop Quadro/Tesla.
So who can tell me for all the mobile GF10x cards, what the DP:SP ratio is, assuming the same clock speed?
All Fermi cards have roughly 2 SP:1 DP performance ratios, DP is implemented in Fermi by using two SP units together (instead of using completely separate units for DP, to share die space, and presumably reduce gate contention while using both SP/DP at the same time)
The biggest thing to keep in mind are the limits imposed on consumer (GeForce) cards vs. Quadro/Tesla, where nVidia intentionally limit the DP throughput on consumer cards to be roughly (correct me of I’m wrong, this is purely from memory) 1/4th of Quadro/Tesla cards, for financial reasons (no one would buy Tesla/Quadro if you could get identical performance from a GeForce card for 1/10th the price).
Not sure if this is a software limit imposed by the driver, or they wire up the chip a specific way for GeForce cards to limit the capabilities - either way expect 1/4th DP performance on consumer cards to professional cards, and the ‘maximum theoretical’ DP speed to be 1/2 that of SP for professional cards.
Again, all from memory - I didn’t reference anything - so feel free to correct any errors here.
All Fermi cards have roughly 2 SP:1 DP performance ratios, DP is implemented in Fermi by using two SP units together (instead of using completely separate units for DP, to share die space, and presumably reduce gate contention while using both SP/DP at the same time)
The biggest thing to keep in mind are the limits imposed on consumer (GeForce) cards vs. Quadro/Tesla, where nVidia intentionally limit the DP throughput on consumer cards to be roughly (correct me of I’m wrong, this is purely from memory) 1/4th of Quadro/Tesla cards, for financial reasons (no one would buy Tesla/Quadro if you could get identical performance from a GeForce card for 1/10th the price).
Not sure if this is a software limit imposed by the driver, or they wire up the chip a specific way for GeForce cards to limit the capabilities - either way expect 1/4th DP performance on consumer cards to professional cards, and the ‘maximum theoretical’ DP speed to be 1/2 that of SP for professional cards.
Again, all from memory - I didn’t reference anything - so feel free to correct any errors here.
That more or less coincides with what I know, but my question was more about the mobile chips, and in particular whether the 5000M was in fact NOT crippled like the GTX cards, thereby getting the 2:1 maximum DP factor.
In fact on some non-laptop DP work I had found the GTX480 to be about twice as fast as the 285, so that even with the crippling it was pretty good - I have not tried that same code on Tesla or Quadro yet and have high hopes.
That more or less coincides with what I know, but my question was more about the mobile chips, and in particular whether the 5000M was in fact NOT crippled like the GTX cards, thereby getting the 2:1 maximum DP factor.
In fact on some non-laptop DP work I had found the GTX480 to be about twice as fast as the 285, so that even with the crippling it was pretty good - I have not tried that same code on Tesla or Quadro yet and have high hopes.
I’m not sure anyone’s really tested mobile chips… all the info I’ve gathered has been done by people testing their own Fermi GPUs. nVidia hasn’t officially said anything on the topic that I’m aware of? So there’s no official reference here.
I wouldn’t assume mobile chips would have similar DP performance to Tesla/Quadro cards intentionally (GeForce is GeForce, mobile or not I think nVidia would want to limit the DP, else there’s no appeal to their mobile Quadros), but it’s certainly possible they’ve made a manufacturing mistake somewhere in the mobile segment which doesn’t limit the GeForce cards like they probably intended?
I’m not sure anyone’s really tested mobile chips… all the info I’ve gathered has been done by people testing their own Fermi GPUs. nVidia hasn’t officially said anything on the topic that I’m aware of? So there’s no official reference here.
I wouldn’t assume mobile chips would have similar DP performance to Tesla/Quadro cards intentionally (GeForce is GeForce, mobile or not I think nVidia would want to limit the DP, else there’s no appeal to their mobile Quadros), but it’s certainly possible they’ve made a manufacturing mistake somewhere in the mobile segment which doesn’t limit the GeForce cards like they probably intended?
Mobile laptops with CUDA cards can burn your lap…, may be apart from your wallet…
Better check projects like “rCUDA” and other similies (like virtgpus…) which can make a remote GPU (possibly over VPN) look like “local” one. So, you probably would need a fast internet connection to prototype on GPUs. (more or less – sounds like a cloud)
Conceptually it is simple: If you can allocate memory , copy data in and out and launch kernels – it hardly matters whether the GPU is across PCIe or Ethernet.
Mobile laptops with CUDA cards can burn your lap…, may be apart from your wallet…
Better check projects like “rCUDA” and other similies (like virtgpus…) which can make a remote GPU (possibly over VPN) look like “local” one. So, you probably would need a fast internet connection to prototype on GPUs. (more or less – sounds like a cloud)
Conceptually it is simple: If you can allocate memory , copy data in and out and launch kernels – it hardly matters whether the GPU is across PCIe or Ethernet.
I am happy to use a remote GPU where I can reliably access it, but have the problem that I am increasingly going around doing demos where I have no control over the environment or networking, so a stand alone portable capability would be much better. I have been doing fine with SP work with a 15in laptop made by UK specialist Kobalt, with a 285m in it, but am looking for DP support the next time around, with the best DP perf I can find. I can get a 460m from them (or a 480m in a 17in) but would really like to know how the 5000m compares to the 460 and 480 on the DP side, and a bit frustrated that the documentation is so obscure on this point. It would indeed be nice to fall upon a mobile chip in the games pile where Nvidia forgot to disable the 2:1 aspects, but I suspect that having been to many meetings with umpteen presentations based on non-TESLA class with the 200 series, that that lesson has been learnt!
I am happy to use a remote GPU where I can reliably access it, but have the problem that I am increasingly going around doing demos where I have no control over the environment or networking, so a stand alone portable capability would be much better. I have been doing fine with SP work with a 15in laptop made by UK specialist Kobalt, with a 285m in it, but am looking for DP support the next time around, with the best DP perf I can find. I can get a 460m from them (or a 480m in a 17in) but would really like to know how the 5000m compares to the 460 and 480 on the DP side, and a bit frustrated that the documentation is so obscure on this point. It would indeed be nice to fall upon a mobile chip in the games pile where Nvidia forgot to disable the 2:1 aspects, but I suspect that having been to many meetings with umpteen presentations based on non-TESLA class with the 200 series, that that lesson has been learnt!