Going to buy a Linux box over the next 3-4 months, to support a GTX 275 or 285 (and eventually a few more of those, but I’ll start with one.) Going to use it for simulations, mostly DP float calcs.
Would like recommendations for CPU, motherboard, and Linux distro. Don’t want $$$$ state of the art, or bargain basement, but want decent speed at reasonable prices. Of course, want compatibility with current NVidia and CUDA stuff (since that’s why I’m getting it.)
Any help appreciated, please post in the forum. Thanks in advance.
Your needs are simple (it gets complex with >2 GPUs). It’s not a big problem with most PC designs to just slap in a 275 or even 285.
If you’re customizing, the main thing you should pay particular attention to is the PSU… make sure to get a name brand one, with 600+ watts, and PCIe power connectors (preferably 2 6 pin and 2 8 pin, just in case you ever want to scale to 2 cards.)
That said, I’ve been really happy with Asus P6T motherboards with an i7 920, 6 or 12 gigs of RAM, in an Antec 183 case. This case also supports the excellent oversized CP-850 PSU, which is a great bang for buck PSU. It’s about a $1500 build for a full system.
Here’s a nice part guide for a high end (but not outrageous) workstation build.
If you really want GT200 GPUs, order them now, because I very much doubt you will be able to buy them in 3 to 4 months time. As for distributions, Ubuntu is probably your best bet. I work with Redhat 5 and CUDA a lot, but I wouldn’t recommend it for a workstation style machine built on very new hardware.
I’m sure there will be a rash of people dumping their 285 and 295 cards to get the 300 series when they come out, and putting their old cards on eBay. Stores will have closeout pricing, and the bigger memory versions will also drop in price (although probably not as much.)
I’m not married to the 200 series, but right now they look like the sweet spot in terms of price/performance. Will reevaluate closer to purchase time.
Really like the engineering in the Antec CP-850, and it’s slightly bigger brother, the CP-1000.
Also, be sure to pay attention to the supported Linux distro version numbers for the CUDA release you are using. Frequent updates to gcc in the Ubuntu distributions and long development/test cycles for the CUDA toolkit generally mean that you are stuck using an Ubuntu release 6-12 months out of date with CUDA. Depending on what has changed, getting CUDA to work on a more recent release can be anywhere from trivial to evil, but generally isn’t worth fighting with unless you have a very strong need to use the newest Ubuntu release.
Yes, I’m sure that’s an issue. Relative to what’s current and stable, which distro generally tracks the best? Fedora is tempting, but as of right now, AFAIK, Fedora is on version 12, and Fedora 10 is supported. For Ubuntu, it’s 9.04 vs. 9.10. Don’t know whether these differences are big or small in release features and/or release numbers.
How does it get complex with more than 2 GPUs? If I had a big case, say the Antec 1200, and the CP-1000, I think I’d have the room and power for 3, if not 4, GTX285 cards. I suspect connectors may be an issue, and I might need to get an additional power module for the cards. Are there other complications? Should I build in more expandability? I know that some researchers have boxes with 4 graphics cards in them dedicated to computation.
There presently isn’t a single socket chipset which has more than 32 PCI-e lanes available for GPUs. That means you get at most 2 x16 slots (on either the intel X58/5520 IO hubs or AMD 790FX north bridge). If you want more than that, you are either looking at x8 slots, or multiple northbridges (like the Tyan S7025). Both are imperfect options that won’t delivery as much host-device bandwidth per card as a 2 GPU set up will. This may or may not matter to you. It certainly matters to my applications.
It gets complex because suddenly your case, your MB, and your PSU all matter. With 1 GPU, you can pick pretty much anything you like.
But even if you’re thinking about parts, it’s easy to go wrong with 4 or even 3 cards. So in your own example, you buy your 4 GTX285 cards with the great CP-1000 and huge Antec 1200. Let’s even assume you have a motherboard that supports them! (The Asus supercomputer is one of the few…)
But now you install everything… except your fourth GPU won’t even fit. Whoops. That giant Antec 1200 case won’t hold 4 double-wide cards. It has only 7 rear slots!
Grumbling, you at least install 3 of your cards. You go to hook them up… and you discover your CP-1000 PSU has enough watts… but only has 4 PCIe power leads! You need SIX! Whoops.
Now you’re playing games with secondary PSUs, etc.
And in the end we’re not even talking about heat issues or noise. :-)
If building a 1 GPU system takes an effort of 2/10, then a 2 GPU system is a 3/10. A 3 GPU system is an 6/10 and a 4 GPU system is a 10/10.
Answering your next question, should you plan for expandability? Your app should tell you. I’ve got three boxes running quad-GPUs… 2 GTX295s in each machine. I have not built an oct-GPU machine (yet) because the cost and effort of going past 4 GPUs wasn’t especially worth it for my apps. Your own computational needs may differ, and you have to balance part costs, power costs, build effort, and noise/heat with the computation type and density you need.
But for 98% of people, even when doing CUDA development, I’d say stick with a straightforward box with one GPU, with enough PSU support to add a second GPU board later if you really wanted one. If you ever need to scale up to an oct-GPU machine, design one from scratch instead of trying to upgrade.
You still only have 32 lanes with that board, just with each pair of 16 lanes switched between two 16 lane electrical slots. It isn’t really any different to having a pair of GTX295s (which just have the same NF200 switch on the GPU instead). You really can’t get peak bandwidth on cards sitting on the same switch simultaneously. But it is certainly better than x8 slots under most circumstances. Theoretically, dual 5520 IO hub designs should be the way to go, except that they seem to have weird NUMA affinity problems that cut into the effective host device bandwidth.
Wow, that’s interesting, since I have used that board before! But my apps aren’t PCIE limited so I didn’t notice any issue. You’re right that it’s better than 2 x8 slots, though.
I also feel cheated since nowhere in the online specs or bundled manual does it say this… the online tech page in fact boasts about the 4 slots each with x16.
It’s like labelling a 500 watt power supply as a “1000 watt” PSU. Sure, it can only handle 500 watts at a time. To get the other 500 watts, you just need to shut down the first load and use it with the second one!
The review mentions the “NVIDIA nForce200 PCI Express Switch”, which is also used on the P6T7. In this case, only one is needed to share an x16 link between two of the slots. This means that three slots can go full speed separately, but two of them can’t go full speed at exactly the same moment. This is still better than some boards which have to drop slots at boot time to x8 to support 3 or more cards.
And presumably, one would have one’s CPU dumping CUDA ops and data down to the GPUs in a sequential manner. So the worst case scenario is that you’ll see some logjamming preventing you from getting a full 50% speed bump from 3 boards vs. 2 boards. But I suspect that you’d still get some improvement, and that this switching method is better than the x8/x8 splitting at boot time.
CUDA mutli-GPU requires one host thread per GPU, so in a scheme with reasonable load balance and a multicore CPU, it is likely that CPU-GPU transactions will roughly happen in parallel. Whether the bandwidth sharing arrangement on one of those >2 PCI-e x16 impacts performance comes down to the application. For embarrassingly parallel problems, it won’t matter (which is why it works fine for gaming in conjunction with a 3 way SLI link). For other things, it might.
Another thing to consider: there is a subtle but important difference between a pair of cards sharing a switched 16 lane link and a pair of cards on separate 8 lane links. The former has higher peak per card bandwidth but switching latency, the latter has inferior peak per card bandwidth but no switching latency. In applications where CPU-GPU transactions are relatively infrequent, and data volumes are large, the former will probably be better on average. In applications where transactions are frequent, but transaction size is smaller, the latter will probably be better on average.
After reading these posts, and others about the next gen of NVidia GPUs, the best solution seems obvious.
Spend as little as possible to get a system that will do a nice job with 2 GPUs, get an adequate 275 or 285 and learn how to use CUDA, OpenCL, and the system, and save as much money as possible for the GTX380 (or GT100, or whatever it’s gonna be called,) with the expectation of buying it when the price drops to a reasonable level, perhaps in 9-15 months. The complexities of a 3 CPU setup aren’t cost effective now (although if Intel makes a newer version of the X58 chipset with more than 36 PCIe pathways in the next few months, that may change.)
If I were to buy today, I’d get:
Intel Core i7-920 CPU - $289.99 Newegg, maybe find a better price elsewhere
ASRock X58 Extreme motherboard - $169.00 Newegg
Antec CP-850 PSU
Antec P-183 case - $215.05 for case and PSU at Newegg
6GB memory, hard drive, etc.
But when purchase time comes around, I’ll re-evaluate all of this. Now I can do it with some knowledge of what I’m doing.
Avidday, SPWorley, Siebert, thanks again for all the advice and discussion.
I just built a system using the Asus P6T and one GTX260. One other thing to keep in mind for this board is that there is no way to set which PCI slot to use for the display. The system uses the first PCI-E slot, and I could not find how to change this in the BIOS.
I was hoping to be able to use one of the PCI slots for the display, and save the PCI-E slots for CUDA. But, that does not look possible. Although, it’s only an issue with more than one card.
Last advice… spend the money on an aftermarket heatsink for the i7 CPU. Noise is always an issue when you get GPUs pegged at 100% and the stock Intel cooler is OK but not great. I was lazy with one of my boxes and kept the stock fan and keep kicking myself with its whine. It’s a pain to change it later, too, so it’s best to do it right to start with.