Best, bang-for-the-buck, CUDA platform? ... Which? 9800 GX2, Tesla C870, new 2xx ...

Which nVidia card will give me best bang-for-the-buck CUDA processing??

I really don’t care about display performance. I want CUDA for linear algebraic work. Double precision would be nice (in the new Tesla processors), but I probably don’t want to wait another 4 months and pay premium $$$.

Any ideas??

Thanks!

  • Aaron

It’s mostly a matter of plugging shader frequency, SP counts, and prices into a ratio. Try a table of cards like http://en.wikipedia.org/wiki/GeForce_9_Series and

a list of prices like newegg.com.

Probably for the simplest measure of Flops/$ it’d be the 9600GSO… only $120 on Newegg gives you 96 SPs at 1375Mhz.

For double support, you don’t have many options… G260, G280, or new Teslas. The low-end G260 will be the obvious bang-per-buck when they come out next week.

Bang is very algorithm dependent. The new cards with their double amount of registers are I think quite hard to beat when you have complex algorithms that need a lot of registers.

Where did you get the info on registers ? I still can’t find anything about the CUDA-specific aspects of new cards on nvidia site … registers, amount of shared mem, number of threads/blocks e t c.

yes, the number of registers/MP has been doubled

here you can find a good article, but in French only ^^

Arghh …

The information is at the fingertips - but I don’t speak French :-) Automatic translators makes things hard to understand …

There is a paper from nvidia on the website at this time where it is mentioned the amount of registers is doubled (amongst a lot of other stuff). Sorry, where I am right now the flash(y) website of nvidia is not working, so I cannot give you a link, it was below the link to 3 movies (in which CUDA was also mentioned a few times)

I see a page with three videos but can’t see any link to extra info page below them … I’d kindly ask you to post the link when you’ll get back.

The luck you have (I am free this afternoon ;)) GeForce_GTX_200_GPU_Technical_Brief.pdf

http://www.nvidia.com/object/io_1213610051114.html
-> http://www.nvidia.com/object/io_1213615494642.html

Many many thanks! :-)

Don’t know about recent matters, but -
My recommendation is to stay slightly above the “sweetest” spot on the price-performance curve. A couple of months ago, it was either the 8800 GT or new GTS. The GT had a better price-performance ratio as I saw it, but for just a little more, the GTS gave me very usable extra power.

(I’m basically echoing E.D. Riedijk’s comment.)

Another good buy is the 512MB 8800GTS which provides 128 SPs at 1650Mhz. The ECS version at Newegg goes for $210. If you buy before the end of June there is a $50 rebate that eventually drops the price to $160.

Just an example comparing gtx280 to 8800GTS 512 (160 dollars) :)

The doubling of the register count means for one of my algorithms that I could go from max. 3 blocks to max 7 blocks per MP. So 163 vs 307 = 48 vs 210 blocks running at the same time.

So that means a gtx280 could cost 160*210/48 = 700 dollars to get the same amount of blocks/buck for that algorithm (bang/buck when it is not memory-bound on either of the cards).
And then you have twice the memory and 141.7 GB/s memory bandwidth (where that 8800gts has 64 GB/s).

The trouble off course is you need to have one to be really able to see what performance your algorithm will get (my 4.375x as many blocks would get memory-bound more quickly as the memory bandwidth got ‘only’ 2.2 times faster)
So for memory bound algorithms it could only cost 160*2.2 = 355 dollars.

How is the 9600GT for $140 from newegg? nice thing is that the particular one is neither loud nor fat, which means that it will fit into a shuttle box computer.

More generally, I am not a gamer, so I have not kept up with nvidia generations. I find the nvidia naming schemes rather confusing. isn’t there a table of basic cost and performance numbers somewhere? I know there is algorithm dependence, but this applies to all CPUs.

ok, specifically: I would say simple blas benchmarks would be good enough for me.

I would also be interested in learning what kind of speedup I can expect relative to an Intel core CPU. I am trying to figure out whether it is worth buying a few of these cards (to use under linux). just ballpark figures would be great.

advice appreciated.

sincerely,

/iaw

Depends on the algorithm you wish to implement (how well does it parallelize and scale) and how well you optimize it. Usually people report 5x-20x speedup. I’ve seen botched implementations that worked slower than the CPU versions and some really nice ones that went 50x faster - it’s hard to tell unless you tell us what you want to do.

thank you. this is helpful. so, a program consisting of nothing but a collection of simple nvidia BLAS library calls should speed up around the 10x order of magnitude? for this, roughly what Intel CPU and nvidia GPU are we comparing?

/iaw

+1 for 9600GSO. Earlier called 8800GS :)

However memory bandwidth is a little low and can constrain some apps.

The memory bandwidth on my card overclocked well but YMMV.

My 9600GT is at 1625Ghz and will little tweak can do 1900-2000Mhz

I guess that is the question if you need alot of memory bandwidth - you need 9600GT, if not - GSO.

as luck would have it, tom’s hardware just published a summary of CUDA on http://www.tomshardware.com/reviews/nvidia…a-gpu,1954.html . they find that they easily get speedups of factor 5 with 8600M GT CPUs and 20 with 8800 GTX CPUs relative to a core 2 duo with 4 threats, even without great optimization, on a problem that lends itself to division of labor.

/iaw

I bought a 9600GT “Zilent” because it’s incredibly quiet. It has the bonus of having 1GB of RAM too, which as far as I can see if very helpful for CUDA. I’ve been amazed at how fast it is for some problems - around 170x for my current one. I bought it for fun/cheap research but I can see I’m going to have to get a proper one at some point. I just hope they’re very quiet!