GTX 590 for GPU. any comments?

I’m planning to buy a GPU card for scientific computing
I’m kind of new in GPU, so i dont really know which hardware is the best. my budget is up to 1000usd
after browsing in internet, I found GTX 590 as the best candidate that fits with budget
anyone has comments about GTX 590? any other suggestions?

FYI, i’m gonna use it for n-body simulation and computational fluid dynamics (CFD)


The 590 GTX is composed of two cards. Maybe someone can explain how are computations handled on dual cards. I wonder if the memory is common or transfers are needed between threads running in the same time on each card?

One thing the 590 GTX does not support CUDA compute 2.1 only 2.0.

The two devices on the GTX 590 are completely separate. You have to manually adapt the program code to split the work between them and transfer data from one to the other as needed.

For CUDA, compute capability 2.0 is actually better than 2.1. The two are fully binary compatible, but (depending on the code) CC 2.0 devices are up to 50% faster per clock and core than CC 2.1 devices.

so do you think cc2.0 is enough? is it better to buy 1 GTX590 with cc2.0 than 2 GTC 560 Ti with cc2.1?

Yes, you should prefer CC 2.0.
I’d also go with the GTX590 as it only needs one PCIe slot, but that would depend a lot only your current system configuration and your future intentions.

How is it possible that the CC 2.1 is slower than 2.0? Why did they change the arvhitecture? Is it better for gaming?

CC 2.1 has 48 CUDA cores per multiprocessor, grouped into 3 sets of 16, much like CC 2.0 has 32 CUDA cores per MP grouped in 2 sets of 16. However, CC 2.1 can still only issue instructions from 2 warps at a time, so the third set of 16 CUDA cores will only be active if 2 instructions can be issued from the same warp. The co-issued instructions from the same warp have to be independent of each other (i.e. read and write different registers), but an independent instruction won’t always be available. As a result, the third set of 16 CUDA cores will sometimes go idle.

Given equal clock rates, CC 2.1 is the same as or faster than (depending on instruction sequence) CC 2.0 per multiprocessor, because the extra CUDA cores in CC 2.1 are sometimes helpful. However, CC 2.1 is the same as or slower than CC 2.0 per CUDA core, because sometimes the extra CUDA cores in CC 2.1 go idle. Since we usually compare CUDA devices based on clock rate and number of CUDA cores, people often say that CC 2.1 is slower than 2.0.

(Exception to the above: CC 2.1 has twice as many special function units (8) per multiprocessor as CC 2.0. For code that spends a lot of time computing special functions, CC 2.1 could be faster regardless of utilization of the third set of 16 CUDA cores.)

Wow thanks. I wonder if there are comparisons for the fft between different cards.

The other reason CC 2.1 is considered slower than CC 2.0 is just the particular GPU configurations NVIDIA has decided to market. The slowest CC 2.0 desktop GPU I can find has 11 multiprocessors (GTX 465) and the fastest CC 2.1 device (GTX 560 Ti, but not the OEM version) has 8 multiprocessors. Due to clock rate differences, the GTX 560 Ti almost certainly beats the GTX 465, and has the possibility to beat the GTX 470 for some floating point instruction sequences. Otherwise, the two populations have no performance overlap.

Certainly, restricting to just the GTX 500 series, all existing desktop CC 2.0 devices are faster than all CC 2.1 devices, regardless of the relative merits of the different multiprocessor capabilities. When extrapolating performance from CC 2.0 to CC 2.1, use ratio of [clock rate] * [# of multiprocessors] as the scale factor. Then, if your code makes use of the extra special function units or CUDA cores per MP on CC 2.1, you will be pleasantly surprised.