I was wondering if anyone has tried builidng a machine for CUDA using four 9800GX2 cards. This should give a total 8 CUDA devices with 128 stream processors in each in one machine. Youd probably need 8 CPU cores, an extream PSU, a special case, plus a PCI-E riser or two.
The hardware I am looking at for this is as follows:
PCI-E riser(s): [url=“Ably-Tech Corp. == View Products”]Ably-Tech Corp. == View Products
Because the lowest two PCI-e sockets are adjacent, they need to be spread out to fit two double-with 9800 GX2 cards in. Hopefully, this should do the job, but an additional riser for the upper one may be needed to allow this to fit on the lower slot.
Is there anything else I am missing here? Has anyone built a system like this before? It would sound like a small helicopter, but this would be put in a dedicated server room somewhere out of sight.
no, you cant have more than 2 gx2’s, i may be wrong and 3 may be possible just for cuda, but 4 deffinately isnt. if you look at the motherboard, the bottom 2 pci-e slots are too close together to fit a gx2(double slot card) in.
if you really want a lot of power and have the money, look into the nvidia tesla systems, you can buy rack mountable stackable systems and they = uber pwnage External Image
"PCI Express x16: four PCI Express x16 connectors supporting simultaneous
transfer speeds up to 4 GB/sec of peak bandwidth per direction and up to 8 GB/sec
concurrent bandwidth."
As PCIe version 1 is 250 MB per channel per direction this would indicate that all four of these have 16 electrical channels. This would be 8x per GPU.
I have also been looking at the ASUS L1N64-SLI WS/B which has four physical x16 slots with the same slot spacing format as the Skulltrail. It will take dual quad Opterons. The channels on this board are only x8, x16, x8, x16 but it is prices much lower than the Skulltrail. There are a lot of negative reviews on Newegg, mostly concerning DOA boards (sometimes with DOA replacements). The discussions of the Skulltrail on the overclocking forums don’t mention any quality control problems. However people with the ASUS seem happy with it once they get one that works.
A “Lot of money” is the right phrase! For Tesla, its on the order of £2500 per pair of cards (equivalent to one 9800GX2, which is about £350). This entire system costs about £2700. So yes, it could be done with Tesla, but for the price of one equivalent Tesla system, one could buy 2-3 of these systems.
Plus, AFAIK Tesla cards only support CUDA 1.0 but 9800GX2s support 1.1. The main difference is atomic operations, which could be useful.
This build wouldn’t work for games. At the moment, the drivers only support quad SLI, and each 9800GX2 is really two cards sandwiched together, so two 9800 GX2s would be the best. So four would not be any better; this is strictly for use as a high-end CUDA system.
That Asus board looks interesting; its good to know there are alternatives available.
I wasn’t planning on water cooling the GPUs. There are water cooled 9800GX2s arounds, but the ones Ive seen are about twice as expensive, so that makes the entire system twice as much with pump etc. In my situation, I can move the system to a remote server room, so the noise is not a problem and I can put as many extra fans on it as can fit (which in a case with 7 5.25" slots on the front is quite a lot).
im not sure so im just skeptical, but are you guys SURE that the drivers support 4 gx2’s? 8 gpu’s? if you are then fair enough and iv learned something new, but id be suprised if nvidia mande sure that 4 gx2’s are supported, given that you cant actually fit 4 on a motherboard without special risers etc etc…
No, I’m not sure at all; I think the only way to be sure would be to try it. However, I don’t see any particular reason why it shouldn’t work. This is a system for computation, not for graphics or gaming, so it doesn’t need SLI or even multiple monitors There are some reviews that have photographs with 10 monitors connected to a single PC, so I believe it should work.
If it does not work, then most of the hardware could be split into two systems with two cards in each. Sot here wouldnt much of a waste in that case.
Another possibility might be three 9800GX2 and one 8800GTS 512. Supposedly the 9800GX2 internally is two 8800GTS 512 cards. All the software should (might) recognize this as seven identical GPU’s. This might be easier mechanically than the riser/cable assembly.
If you build one of these multi GPU let us know how it works out.
Technically you don’t even need to have a graphical interface running or even installed to run CUDA code (that’s what I’m doing I have a CentOS server without Xorg installed, equiped with an old PCI card for text terminal and a 9800 GTX that is solely used for running the CUDA code). So even if the GUI crashes with too many cards (as a reference, see answers to that slashdot thread or search “multiseat X” keywords - lots of local multiusers on X window server are reported but some exhibit instabilities with too many graphic cards) that won’t pose any actual problem in your situation.
As for CUDA, according to the documentation, the API simply considers each GPU as a separate CUDA device, so with your installation you would simply end up having 8 different device to which you could assign 8 different CUDA process. (And the 2x quad core should provide largely enough horsepower to run the CPU side of those task). The main problem I see is memory and bandwidth limitation : GX2 have only 512MB per card, and 8 CUDA task are going to put a significant stress on the bus when each transfers data between host and device. I don’t know, maybe AMD processors fare better as they have onboard memory controller and their dual channel can be decoupled (thus a dual proc system features 4 independent simultaneous access to the memory).
Yeah, this kind of 8 device configuration only makes sense for kernel in which all of the host<->device transfer can be done up front, and then minimal host communication is needed per iteration. There are many problems where this is possible.
I wonder what kernel startup bottlenecks there are. It’s possible that even with 8 CPU cores, there might be some OS-level contention which prevents short kernels from being issued to all 8 devices simultaneously. Would be interesting to benchmark for sure…
Looking at this again. The GX2’s are double wide but this does not mean that they are usable on two PCIe slots which are double spaced. The sides of the any two adjacent cards will be up against each other, the outside air intake is through the back of the card and the exhaust is through the sides. Have a fire extinguisher handy. Water cooling will not help. The cooling block is sandwiched between the two boards and the overall width of the assembly is not reduced, in fact, it may be slightly increased. Also the water blocks I have seen have the hose connections on the side, making double spacing impossible.
I cannot see any combination of risers that would allow four GX2’s on the skulltrail or any other board. Three looks possible: Card 1 in slot 1; Card 2 and riser (which also offsets by one slot) in slot 3; Card 3 and riser/offset in slot 6. Custom enclosure for sure.
There are a variety of industrial PCIe backplanes on the market. Perhaps one of these could work for you. However, when you get out of commodity hardware costs tend to skyrocket. What about a small cluster with MPI? Single quad CPU’s, two 9800GX2’s per node, motherboard with 2 (or more) gigabit ethernet ports.
Perhaps I don’t have the right picture of what you are trying to do. Let me know.