Fastest CUDA card on the market choosing best CUDA card for CUDA computation purpose

alex_tpf · July 14, 2011, 9:54pm

hi everyone,

I know this question might be more appropriate to be posted on the hardware forum, but I’d like to emphasize “choosing the best CUDA card for CUDA computation purpose” so I post it here.

I’d like to find out the fastest (or powerful) CUDA card on the market, for parallel CUDA programming purpose(i.e., not for video gaming), to be installed in a desktop PC.

I looked into the CUDA zone GPU table

At first I assumed the CUDA card with higher “compute capability” should be more powerful, ex: Geforce GTX 295 (compute capability 1.3) vs Geforce GTX 560 Ti (compute capability 2.1). I know the number means CUDA version compatibility, and I reasonably thought it also represents the powerfulness.

After looking their spec and price, I found out I was probably wrong??

It’s even harder to a cross-family comparison, ex: Quadro vs Geforce, each model has it’s pros and cons by their spec??

Can someone kindly direct me, how do I find the most powerful card?? or putting it in another way, how do I make comparisons with these cards, for parallel CUDA computing purpose??

thanks.

Alex

tera · July 14, 2011, 10:41pm

The two most important metrics are the memory bandwidth (GB/s) and single precision arithmetic throughput (GFLOP/s). To a lesser degree, double precision throughput might also be of interest, depending on your particular code.

Nvidia directly specifies memory bandwidth for all their cards. The single precision arithmetic throughput is, for all practical purposes, calculated as 2Ã—number_of_CUDA_coresÃ—processor_clock (peak throughput for compute capability 1.x devices is 50% higher than that, but this is difficult or even impossible to achieve in practice). The latter values are also listed in Nvidia’s spec sheets. Double precision throughput for Tesla and Quadro cards is half their single precision throughput. For Geforce cards of compute capability 2.1 it is 1/12th of the single precision throughput, for compute capability 1.3 and 2.0 it is 1/8th of the single precision throughput as given above.

As you figured out, higher compute capability does not necessarily relate to higher computational power. However, nowadays you would probably want to buy a compute capability 2.x device, as their new features make programming them a lot easier.

For single precision or integer code. the fastest single GPU CUDA card from Nvidia currently is the Geforce GTX 580. On double precision problems, it might (or might not) be beaten by the Tesla 20x0 cards. This however depends a lot on the particular code.

Skybuck · July 15, 2011, 12:12am

I’ll give you my answer:

Look at memory hertz especially if you dealing with super large datasets like hundreds of megabytes or even gigabytes, and ultimately that’s with parallel-stuff is about, lot’s of stuff to do lot’s of stuff in parallel External Image
Then look at memory bus bits. The more bits it has the more bits it can push through at once. It would be nice if the specs mentioned if this could also be 2x128 bits or 4x64 bits or 8x32 bits or 16x16 bits (all with hopefully different memory addresses for extreme random memory access performance !)

Better sequential performance speaks for itself I would think External Image though there could always be cave ats External Image

If memory and memory bandwidth and memory access performance is not main concern but instead doing many computations then:
Look at number of multi-processors.
Look at number of cuda cores.
Last look at shared memory but this is probably difficult to use and doesn’t do so much… it’s usually very little like 50 KB or so… compared to 1 GB of ram, it’s peanuts External Image only very small problems or which have small inner loops could use that External Image So far I have seen some algorithms which use it, but makes the algorithms much more complex.

Something which is apperently new is:

Caches like L1 and L2… I am not sure how big they are or if this information is available… I am also not sure if they are necessary for “coalescing” or not…

I would definetly go for highest compute capability so that you can program with easy and use the latest tips and trick and techniques and languages features, this will make your software last longer.

You can always buy new card in future or so 5 years, 3 years whatever… but to limit yourself to compute 1.3 or compute 1.1 while compute 2.2 or something is already out would be kinda foolish I think… because these compute capabilities can/could come in handy and are probably needed to make the somewhat more advanced algorithms.

So if you do not plan on buying a new graphics cards every 3 months, go for long term External Image :) and get highest compute capability ! External Image :)

alex_tpf · July 15, 2011, 3:55pm

thank you so much for both of your help!!

Skybuck · July 15, 2011, 8:44pm

Also one last important tip:

PCI Express 2.0 can be bottle neck if having to transfer a lot between cpu/host and device/gpu.

PCI Express 3.0 is going to be faster, so if you can get one of those cards, it would be better too External Image

tera · July 15, 2011, 9:18pm

Good luck finding a PCIe 3.0 card. I must have missed the part where Alex said he is planning a system for the distant future.

Skybuck · July 15, 2011, 10:45pm

Lol news message today:

http://www.hardwaresecrets.com/news/ASRock-Unveils-Motherboards-with-PCI-Express-30-Slots/5787

I have seen other news messages about this as well…

PCI Express 3.0 is coming real soon ! External Image :)

Skybuck · July 15, 2011, 10:54pm

I’d also take a look at amd’s apu’s… amd doesn’t seem to support cuda (yet) on their graphics cards or embedded graphics chips inside processors,

But there is an open source project for linux mostly called ocelot, maybe it can recompile cuda kernels to amd’s IL (intermediate language).

So then there is some hope of running cuda kernels on amd hardware.

Also replacing nvcuda.dll with something that calls for example opencl api’s shouldn’t be to hard… question is if anybody is going to invest time into make a cuda api clone for opencl…

That would kinda be weird/silly.

cuda api on top of
opencl api on top of
cuda api

^ opencl is implemented on top of cuda… at least for nvidia I think… so maybe on amd it would look like:

cuda api on top of
open cl api

So then it’s not so silly External Image

Question remains what opencl api is on top of on amd… maybe “close to the metal” or fire stream or stream or something…

If nvidia would implement cuda on top of opencl or on top of amd/ati hardware…

That would make me real happy…

Then I can simply/happily continue development on cuda and optimize for nvidia hardware… and hopefully still have somewhat reasonable performance on ati/amd External Image

Currently performance for ati/amdi = 0 because it wont run lol.

Skybuck · July 15, 2011, 11:07pm

This is a nice chart:

http://www.asrock.com/microsite/pcie3/models.asp

Some of these motherboards have 2xPCI 3.0 so if you looking for an SLI board… External Image

Joe_Fatmama · July 16, 2011, 7:54pm

basically:

if you need to use TCC (e.g. plan on rdp’ing into your machine, etc.), or need ECC capability, or are planning to use GPU in some production context with MS HPC – you need a Tesla.

Otherwise, GTX590 is the most powerful CUDA card on the market. It is over 2x of Tesla’s performance and costs <$700, vs >$2,000 for Tesla. The architecture of GTX590 is superior to/newer than that of Tesla 2050/2070.

You will have to do funny things like tweaking registry in Windows to disable tdr delay and such, in order to fully use Geforce for computation, but it’s well worth the extra effort.

Topic		Replies	Views
CUDA Laptop A discussion on Benefit-Cost Ratio. CUDA Programming and Performance	42	37259	July 2, 2009
CUDA Graphics Card suggestion Low end range PCI Express 1x CUDA Programming and Performance	9	6198	September 28, 2008
graphic card support card choices CUDA Programming and Performance	9	4604	July 18, 2008
CUDA hardware & software CUDA Programming and Performance	9	2662	November 13, 2010
Buying a CUDA card - questions CUDA Programming and Performance	2	6568	June 22, 2011
Advice on first CUDA system CUDA Programming and Performance	13	2674	July 7, 2009
advice needed by a PhD student CUDA Programming and Performance	26	2835	December 4, 2011
Student buying card for CUDA. Which one? CUDA Programming and Performance	16	14837	December 4, 2012
newbie questions CUDA Programming and Performance	14	1864	September 24, 2010
New to Tesla/CUDA questions Just a few questions. CUDA Programming and Performance	7	7915	October 24, 2007

Fastest CUDA card on the market choosing best CUDA card for CUDA computation purpose

Related topics