CUDA Laptop A discussion on Benefit-Cost Ratio.

Good morning,

So, the best way to find the right graphic card with the right Cuda Compute Capability is to check the list of graphic card sorted by category in

http://www.nvidia.com/object/cuda_learn_products.html

Then you take the list of Cuda Compute Capability for each card that you will find in

http://developer.download.nvidia.com/compu…Guide_2.2.1.pdf

(it’s exactly the same data table that jph4599 finds in 2.3 beta)

And finally, you go on http://www.laptopspirit.fr/ to know if your graphic card exist in a laptop, but I discovered that alienware has an edge cause they are already selling laptops with GTX 260M that I found nowhere (I want the cheaper laptop with CCC 1.3 :rolleyes: ).

You are aware that the GTX260M isn’t a 1.3 capability part? It is a mobile derivate of the G92b (9800GT), and is only compute 1.1.

And does exist a laptop with Cuda Compute Capability 1.3 ?

It seems that the best graphics card in laptop is GTX 280M with 14 Multi proc and CCC 1.1 and not really cheap :(

There are presently no compute 1.3 mobile parts that I am aware of. There should be some compute 1.2 parts coming later in the year, but if you want double precision, then the big PCI-e X16 desktop/server boards are the only way to get it for the moment.

Do you think that CUDA features and size of GPU are linked ?
What I mean is, do you think that a embedded GPU could allow to use CUDA even if you have less MP, less memory …

Sure. The midrange mobile parts are certainly perfectly functional and work well for development and for small and mid sized problems. A 16 or 32 core 9xxxx part will probably comfortably outperform the host dual or quad core mobile CPU it is paired with. But there is no getting around the limitations of laptops for performance computing generally, and for CUDA it is no different.

Yes, of course you right ! What I mean is do you think that a functionality like atomic functions or double precision could be disabled because of GPU size ?

Other linked questions, what area of GPU is managing this functionalities ? Does it need a lot of space ? Do they add just some connections or modules ?

Double precision is certainly not present in mobile GPUs because of the chip area required for a double precision floating point unit in each multiprocessor. The compute capability 1.2 mobile GPUs coming later this year (which have everything the 1.3 GPUs have except DP) are possible because of NVIDIA transitioning to the smaller 40 nm process.

For me it’s ok, thank you very much for your help : avidday and seibert !

I hope that nVidia will improve quickly its mobile GPUs for developer who want to use CUDA on laptop without loosing too much features.

The best strategy I suppose is to wait better laptop, cause to buy now is expensive, bad investment and consequently not very smart, right ?

Umm, I don’t regret buying that laptop with a 9600M GT GPU. The laptop was quite inexpensive.

I was talking about laptop for CUDA development…

I need to buy a laptop and if I do it now, I’ll not have all Cuda features (because there is only CCC 1.1 and I need atomic functions) moreover in approximatively 2 months, a new range of laptop with CCC 1.2 or 1.3 will appear on the market, to buy two laptops in less than a quarter, it’s not very smart, don’t you think so ?

?? Atomic operations on global memory are in compute 1.1. It is the main addition that compute capability brings.

No arguments there.

I see it like this: The current generation hardware is quite cheap (and will even get cheaper) with next generation on the horizon.

If shared memory atomics and other Compute 1.2 features are a need for your application, then wait of course.

In my case I’ve always been happy with Compute 1.1 and global atomics (allowing for inter-block synchronization primitives and such)

Christian

Exact, It’s not really expensive until CCC 1.1 is enough and the number of multiprocessor isn’t very important.

Another question, how the number of multiprocessors and GPU computing time are linked ? Are they very influenced by the algorithm, or not ?

In fact, the number of MPs change a lot from a graphic card to another (on laptops I see 1Mp to 14Mps) so how can you estimate your actual computation time on another graphic card with more or less Mps ?

Computation speed scales with GPU (shader) Clock and Number of MPs

Contributing factors for memory bandwidth: dedicated (++) or shared ram (–)

                                                             GDDR2 (-) or GDDR3 dedicated RAM (+)

                                                             memory bandwidth: 64 bit (--) or 128 bit or 256 bit (++)

                                                             better coalescing in compute model 1.2/1.3 (+) vs compute model 1.1 (-)

It is very interesting ! So the best way to improve computation speed (with material) is to have in order of priority :

  1. The maximum number of MPs : Improve computation speed ?
  2. The maximum GPU Shader Clock : Improve computation speed too ?
  3. GDDR3 dedicated RAM or better : Improve accesses to Global/Texture Memory ? And other memories ? …
  4. A 256 bits memory bandwidth or more : Data transfers from CPU to GPU and GPU to CPU ?
  5. The latest Version of CCC : Improve efficiency of code ?

Is that right ?

Guess that depends on whether most of your applications are operation or bandwidth limited…

N.

Yes of course, I was thinking about my case, just forget the order !

But are associations good or not ? Complete/incomplete, …

I believe that the width of the memory bus relates more to “GPU global memory <-> GPU global memory” and “GPU global memory <-> on-die memory/registers” efficiency…

N.