CUDA Laptop A discussion on Benefit-Cost Ratio.

There are presently no compute 1.3 mobile parts that I am aware of. There should be some compute 1.2 parts coming later in the year, but if you want double precision, then the big PCI-e X16 desktop/server boards are the only way to get it for the moment.

Do you think that CUDA features and size of GPU are linked ?
What I mean is, do you think that a embedded GPU could allow to use CUDA even if you have less MP, less memory …

Sure. The midrange mobile parts are certainly perfectly functional and work well for development and for small and mid sized problems. A 16 or 32 core 9xxxx part will probably comfortably outperform the host dual or quad core mobile CPU it is paired with. But there is no getting around the limitations of laptops for performance computing generally, and for CUDA it is no different.

Yes, of course you right ! What I mean is do you think that a functionality like atomic functions or double precision could be disabled because of GPU size ?

Other linked questions, what area of GPU is managing this functionalities ? Does it need a lot of space ? Do they add just some connections or modules ?

Double precision is certainly not present in mobile GPUs because of the chip area required for a double precision floating point unit in each multiprocessor. The compute capability 1.2 mobile GPUs coming later this year (which have everything the 1.3 GPUs have except DP) are possible because of NVIDIA transitioning to the smaller 40 nm process.

For me it’s ok, thank you very much for your help : avidday and seibert !

I hope that nVidia will improve quickly its mobile GPUs for developer who want to use CUDA on laptop without loosing too much features.

The best strategy I suppose is to wait better laptop, cause to buy now is expensive, bad investment and consequently not very smart, right ?

Umm, I don’t regret buying that laptop with a 9600M GT GPU. The laptop was quite inexpensive.

I was talking about laptop for CUDA development…

I need to buy a laptop and if I do it now, I’ll not have all Cuda features (because there is only CCC 1.1 and I need atomic functions) moreover in approximatively 2 months, a new range of laptop with CCC 1.2 or 1.3 will appear on the market, to buy two laptops in less than a quarter, it’s not very smart, don’t you think so ?

?? Atomic operations on global memory are in compute 1.1. It is the main addition that compute capability brings.

No arguments there.

I see it like this: The current generation hardware is quite cheap (and will even get cheaper) with next generation on the horizon.

If shared memory atomics and other Compute 1.2 features are a need for your application, then wait of course.

In my case I’ve always been happy with Compute 1.1 and global atomics (allowing for inter-block synchronization primitives and such)


Exact, It’s not really expensive until CCC 1.1 is enough and the number of multiprocessor isn’t very important.

Another question, how the number of multiprocessors and GPU computing time are linked ? Are they very influenced by the algorithm, or not ?

In fact, the number of MPs change a lot from a graphic card to another (on laptops I see 1Mp to 14Mps) so how can you estimate your actual computation time on another graphic card with more or less Mps ?

Computation speed scales with GPU (shader) Clock and Number of MPs

Contributing factors for memory bandwidth: dedicated (++) or shared ram (–)

                                                             GDDR2 (-) or GDDR3 dedicated RAM (+)

                                                             memory bandwidth: 64 bit (--) or 128 bit or 256 bit (++)

                                                             better coalescing in compute model 1.2/1.3 (+) vs compute model 1.1 (-)

It is very interesting ! So the best way to improve computation speed (with material) is to have in order of priority :

  1. The maximum number of MPs : Improve computation speed ?
  2. The maximum GPU Shader Clock : Improve computation speed too ?
  3. GDDR3 dedicated RAM or better : Improve accesses to Global/Texture Memory ? And other memories ? …
  4. A 256 bits memory bandwidth or more : Data transfers from CPU to GPU and GPU to CPU ?
  5. The latest Version of CCC : Improve efficiency of code ?

Is that right ?

Guess that depends on whether most of your applications are operation or bandwidth limited…


Yes of course, I was thinking about my case, just forget the order !

But are associations good or not ? Complete/incomplete, …

I believe that the width of the memory bus relates more to “GPU global memory <-> GPU global memory” and “GPU global memory <-> on-die memory/registers” efficiency…


With the specific intent of CUDA developing (and gaming… why not?) I have bought an Acer AS5930G for €550, a couple of weeks ago. I think in the States you will find it (or something equivalent) for less than $500.

It has a 9600M GT on board, 512MBs - really cool and fast, as a laptop - 32 MPs at 120MHz. At this time this is the best option I have found. The latest 130M laptops are still too expensive - €900-1000 here in Italy - and do not add that much from a CUDA-development point of view: in fact usually often they fit a more game-oriented machine. (larger monitor, TV tuner,…).

I enjoyed this page to make all the possible comparisons among all the nvidia mobile cards:

Actually it has CCC 1.1, OK, but if you are looking for real speed you NEED a desktop - with 1000€ you buy a perf monster. As a laptop even the fastest card will be a toy (can not burn 200W at about 2GHz without letting the battery explode).

I agree on all points except one, 9600M GT has only 4 Mps (NVIDIA_CUDA_Programming_Guide_2.2.1.pdf - Page 101)

My fears are to have a big gap in term of performance and a laptop on which your code is not optimized as you should do on a classic computer. And that’s problem exists because of Cuda Compute Compatibility which brings new features, more efficient, so your code with CCC 1.1 become in a certain way obsolete (even if differences are not really important)

These 4 MPs are equivalent to 32 “Cuda processors”, as marketing material will tell you. So the ALUs inside the MPs are now called CUDA Processors. Yay.

Yes it’s exact 4 Mps <=> 4 x 8 = 32 Cores or “Cuda processors” as you said but I forget the equivalence such I was impressed by “32 Mps” in a laptop ^_^