CUDA Graphics Card suggestion Low end range PCI Express 1x

Hi guys,

It seems that CUDA has matured a little bit and the supported graphics cards do not cost an arm and a leg anymore so I decided to jump on the bandwagon. It is for personal interest only so no possible academic or professional use yet ( although I spend my day doing electronic structure calculations duh! ).

My machine is a Dell Poweredge T105 with PCI Express 1x and 8x slots only. I can’t find a 8x card but I can pretty much get, for the same price, a GeForce 8400 GS or a NVS 290. I am leaning towards the NVS because of the dualhead support but I was wondering if anyone can suggest alternatives. Both are CUDA compatible. From your experience, are there limitation on coding apart from performance? Can I expect that a code written in a generic way ( with CUDA of course ) will scale if I run it in a top of the line GeForce?

Any suggestions are much appreciated.

Due to operating system and CUDA overhead, the most important thing is to make sure you have a card with at least 256 MB of memory. Cards with only 128 MB of memory will not have much space left for your CUDA applications, and some of the sample programs in the SDK won’t even run.

To get a rough idea of how things would scale from your card to the top of the line cards, look at both the ratio of memory bandwidths and the ratio of [stream processors] * [shader clock rate]. The first ratio will estimate the scaling of memory bandwidth bound kernels, and the second applies to compute bound kernels. (Many tasks turn out to be memory bandwidth bound, surprisingly.)

This assumes that you write your CUDA programs to use many blocks. At least 64 or 96 will ensure reasonable scaling when the program is run on faster hardware with more multiprocessors. (Of course, even more blocks than that is also good, if applicable to your problem.)

The more expensive GTX 2xx cards also support double precision and more efficient uncoalesced memory transfers, which you can’t even test on the cheaper cards. But for normal, single precision work, the above scaling rules should apply.

Thanks for your reply. Both 8400 GS and NVS 290 have 256MB RAM. I would love a double precision card but I am constrained by the 1x PCI-Express slot, most of the graphics cards are 16x. OK I think I will pull the trigger on the NVS 290, for a learning platform it should be OK.


Is the 8x slot physically a 16x? If that’s the case, you could fit a 16x card there. You would suffer the 8x bandwidth of course.

I know I know… Unfortunately, it is a pure 8x slot. The whole concept is that the Dell Poweredge T105 is a cheap server with pretty much everything messed up to avoid being used as a workstation. So no PCI Express 16x either physically or electrically. I read you can cut the plastic and just stick a 16x card with the rest of the pins on the air or use a riser but these are rather non-canonical solutions and I don’t like them very much :)

Heck, if it’s for fun and not serious yet, why not go the free route? Just install the SDK and use the emulator. It won’t be much slower than those super-low-end GPUs.

Well, I only have an onboard GPU ( an ATI ES1000 if I am not mistaken ) so I actually need a graphics card. The NVS 290 would offer me other advantages as well besides CUDA, but CUDA is a plus compared to a solution by ATI or Matrox. The emulator is not a bad idea, I will have a look how it works in Linux, I am using Fedora Core 9 x86_64. Thanks!

I’m not sure you can even install nVidia drivers and the CUDA package without a suitable card without confusing the installer (anyone tried?) so you might want to get that GPU anyway. Besides, emulation isn’t exactly GPU execution, there are some quirks that can escape you if you stick to emulation alone.

Why not wait for CUDA 2.1 and use the --multicore option to NVCC ???

Theoretically you can emulate everywhere, but i can attest that you can emulate under linux. Especially, if you were able to install matlab and the commands SURF and SYM work then you shouldn’t have any problem.

It will be enough to learn the basics of CUDA, especially if your C is a bit rusty. Anyhow it’s true that you won’t learn that much about parallel programming with emulation, this is why my advice while choosing a budget card to choose the one with the best # of cores to dollar ratio, so you’ll be able to learn at best how to use a multiple core processing unit.