Your question is so fundamental that I’m afraid you are going to spend a few k$ without having much experience in GPU computing.
Obvious questions: do you have the software that will efficiently deal with LBM multiGPU simulations or hope to write it yourself? If not, buy 2 cheap GTXes and wait for new hardware while you develop the software.
Have you performed any real numerical simulations with GPUs, or just take your knowledge from “the press”?
My experience says that the single factor that determines computation efficiency is the memory bandwidth and its efficient utilisation.
Thus, even if my GTX480 is capable of about 100GFLOP/s in double precision, I’m pleased to see it achieve 20GFLOP/s, as this is 75% of
the number derived from peak memory bandwidth considerations. My Fermi is as much faster than my AMD 6-core cpu, as its memory system is.
Thus, my advice is:
- Try and find out if your problem is bandwidth- or computation-limited. The answer should come from experimentations rather than theory, though :-(
For getting the full bandwidth utilisation in GPUs may be not that obvious.
- Then choose the card that will have the required bandwidth or GFLOPS characteristics.
2a) Remember that many Fermi-based cards out there on the market do not meet the specification found on the NVidia website.
For example, I was stupid enough to buy an expensive Fermi GTX 480 card only to see that its bandwidth is only 120GB/s, whereas NVidia claims it should be capable of 177.4 GB/s.
2b) Don’t buy old cards for GPGPU purposes. New architecture is more flexible; more shared memory, more registers, more instructions, more ways to play tricks.
2c) My experience with GTX 285 and GTX 480 , both of the same bandwith 120GB/s, says that Fermi is about 50% faster in my linear-algebra applications. And I managed to speed it up even more by next 50% by rewriting the kernel to take advantage of the fact that Fermi has far more shared memory!
2d) GTX 480 at full load is far more noisy than GTX 285 :-(
2e) A year ago my friend bought GTX 295 with GPGPU in mind; only very recently did he start to use it as a multi-GPU system - it’s not that easy!
- For multiGPU system the next single important factor is the motherboard and how it handles PCI-E bus(es). When I got my first CUDA card and I tested the PCI-E bandwith, I obtained only 0.5 GB/s, horror!
Now I have 3GB/s. In theory PCI-E x16 gen. 2 is capable of 8GB/s in each direction. Remember that you’ll have to transfer your data twice: to and from a CPU! I guess this will be the narrowest bottleneck, although with a lot of cleverness and effort it is (allegedly, never tried myself) possible to compute and transfer the data simultaneously.
In summary, the only choice is between GTX 580 (192 GB/s)) and GTX 570 :-) plus the best motherboard with true pci-e x16 gen 2 support in all slots, also consider the total memory each card should have and don’t forget to consult your electrician about your power supply :-).