What do I need for a 4 GPU CUDA Setup?

GrooveXT · November 19, 2008, 11:02am

Hi,
I’m going to do a CUDA project for a scientific research. Now I’m planing the system and thinking about 4 Geforce Gtx 280 Cards.

I’m wondering which mainboard would be able to take 4 graphiccards. It doesn’t matter wheter it is AMD or INTEL. For Intel I only found board with 3x PCIex16 ports, for AMD I found some with 4 ports but with an AMD chipset.
So is the chipset importent for CUDA?

The next problem is the operating system. It must be windows, because I only want to extend a existing programm by CUDA which is already programmed for windows. But it doesn’t matter which type of windows. So what would be better vista 32, vista 64, xp or xp 64? 4 gigs of RAM would be nice but arem’t necessary.

Last question:
Is it necessary to connect all graphiccards to a monitor to use them?

Thx for your help.

PS: Tesla is no option because the hole system price should be under 3000 Euro.

Simon_Green · November 19, 2008, 12:04pm

You might want to check out our recently announced new “Personal Supercomputer” systems:
[url=“http://www.nvidia.com/object/tesla_supercomputer_wtb.html”]http://www.nvidia.com/object/tesla_supercomputer_wtb.html[/url]

dam5h · November 19, 2008, 5:38pm

I don’t have any monitors hooked to my cuda box, but it is running linux; which may be why that is possible. I just don’t run an xserver.

skippy1729 · November 19, 2008, 11:23pm

Boards with 4 PCIe slots:

IBM Skulltrail (BOXD5400XS) 4 PCIe x16 1.0, Intel 5400 chipset.
ASUS L1N64-SLI WS/B four PCIe slots that are physically 16 but electrically x16 x8 x16 x8, Nvidia 680a chipset, also version 1.0 PCIe.

These are both dual CPU boards, Xeon or Opteron but you can run them with one quad-core which will give you one CPU core per GPU. You will get somewhat reduced CPU/GPU memory bandwidth with the x8 slots.

If your budget is 3000 euro, the ASUS will be much cheaper, be sure to get BUFFERED memory, their manual is confusing. The Skulltrail only supports high end (US$1000) Xeons and expensive registered memory. Also, make sure your PSU has enough PCIe connectors and the Skulltrail requires TWO CPU power cables.

Skippy

seibert · November 19, 2008, 11:30pm

You should take a look at the FASTRA page for information about the issues involved:

[url=“http://fastra.ua.ac.be/en/index.html”]http://fastra.ua.ac.be/en/index.html[/url]

The page says 8 GPUs, but it is actually 4 cards because they built it using 9800 GX2 cards, each of which appears as two CUDA devices. Their parts are a little old now, but would be a great starting point for such a project. Definitely read their Technical FAQ (under “Specs and Benchmarks”) for some discussion of the cooling challenges.

Their system was built with 4000 euros at a time when the 9800 GX2 cost more than the GTX 280 does now, so your 3000 euro goal is probably doable.

x0r · November 20, 2008, 2:25am

Take a look on Asus P5N64, it has 2 PCIe 2.0 x16, 1 x16 and x8, you will need a riser card to use 4 GTX280.

alex_dubinsky · November 20, 2008, 6:15am

Instead of two power supplies, buy a “video card” PSU that slots into a 5 3/4" bay, such as this one: [url=“FSP Group BoosterX 3 FSP300-1E01 300 W Supplementary Power Supply - Newegg.com”]http://www.newegg.com/Product/Product.aspx...p;Tpk=gpu%20psu[/url] (you’ll need 1 of these per card)

Don’t use Vista because last time I heard it doesn’t activate video cards that don’t have monitors attached (although this may have gotten fixed). (Alternatively, you can wire up a dongle to look like a fake monitor.) There’s no problem going 64bit, but as always you may uncover some bugs in your existing code and also you waste some resources (all GPU pointers become 64bit too).

Simon, that ‘personal supercomputer’ is over six grand. Come on. You can do the same thing for one and a half (much less EUR3000).

skippy1729, the Skulltrain uses “Fully Buffered” memory (aka FB-DIMM). This is very different from simply “buffered” memory.

alex_dubinsky · November 20, 2008, 6:24am

In Newegg, it’s very easy to do a search for 4x PCI-E 2.0 x16. Go to motherboards, then pick intel or amd (amd has more), and go to Advanced Search

Most do not have four cards spaced two apart, which is what you need. But here are two:

[url=“Foxconn Destroyer AM2+/AM2 ATX AMD Motherboard - Newegg.com”]http://www.newegg.com/Product/Product.aspx...N82E16813186152[/url]
[url=“MSI K9A2 Platinum AM2+/AM2 ATX AMD Motherboard - Newegg.com”]http://www.newegg.com/Product/Product.aspx...N82E16813130136[/url]

They are both very cheap. They wire up as x8 when you use four slots, but honestly you don’t need more bandwidth than that. (Anyway it is the same bandwidth as the Skulltrain’s PCI-E 1.0 x16.)

Last step is finding a case with 8 slots. (The standard is 7.)

Please let us know when you get this done, and post pics! With all the money you save (use GTX260), maybe you should build two and make a cluster?

GrooveXT · November 20, 2008, 12:01pm

Many thanks to all of you.

That are much informations… I’m going to check all of them.

I will post back when I’ve decided which configuratin to use.

But I’m still not sure about the operating sytem. If I would take 3 GB main memory and 4 graphics with 1 GB RAM each (equals in 7 GB overall memory), could 32 Bit Windows adress it? Or will I get some compatibilty problems?

Again many thanks. It seems to be a realy good community here.

AndreiB · November 20, 2008, 5:14pm

32 bit OS is capable of adressing 3 GB of RAM and 4x1GB of video memory are not accessible by OS memory manager anyway, that is different from conventional RAM.

Don’t use Vista because of limitations Alex wrote about; use Linux or XP.

Use good PSU, at least 1.5 kW.

Another problem not mentioned here is cooling – you’ll have to install aditional case fans at least.

alex_dubinsky · November 20, 2008, 7:01pm

Yeah, it’s no problem. BTW, I wanted to say that even if you get a 32bit OS you should stock up on RAM (eg 8GB). 32bit OSes can still make use of up to 64GB (eg for swap space and disk cache), it’s just a single 32bit application can’t access that much.

tmurray · November 20, 2008, 7:25pm

that’s not really true as far as I know (unless you use something like PAE, which is nice and slow)

honestly, if you’re not using a 64-bit OS as your primary development platform at this point, you should be.

E.D_Riedijk · November 20, 2008, 7:28pm

Oops, dangerous remark when the only platform that has a CUDA debugger is 32 bit ;) :D Other than that you are 100% right.

tmurray · November 20, 2008, 7:37pm

there was much wailing and grinding of teeth when they told me that it’s 32-bit only for the moment and promised 64-bit would follow as soon as possible. :P

(this is why I used the word “primary,” because I still have a secondary 32-bit install for testing and debugger)

alex_dubinsky · November 20, 2008, 10:30pm

It’s absolutely true. There’s nothing wrong with PAE. You say it’s “slow,” which it may be for certain use cases (eg developing applications with it). But for swap and disk cacheing, it is blindingly fast. (Since your other reference point for performance is a hard drive.)

Since most people use the extra ram only for swap and disk cache anyway, >4GB of RAM on a 32bit OS is almost as good as on a 64bit OS. However, for development, use 64bit so you’re not surprised by pointer bugs down the line. (Unless you’re porting existing 32bit code and would rather not deal with them.) Also note that you might have to set up both a 32bit and 64bit build environment because the 64bit CUDA Toolkit can’t compile 32bit code (for distribution to 32bit users).

Honestly, I really wish this whole topic wasn’t a point for discussion and source of issues in this day and age. Sigh.

GrooveXT · November 21, 2008, 4:08pm

Last time I installed windows XP 32 Bit it could only handle 3.5 gb of system memory. I know that there are extra address lanes since the pentium generation, but the normal windows XP doesn’t use them.
So you say application could use more RAM than windows XP could supply?

But the system memory is not my problem, the application uses only about 200 MB so far.
But I’m affraid that there could be some problems with the huge amount of graphic memory.
I always thought that the 32 Bit address lanes were used to address all memory available in the system?!

alex_dubinsky · November 21, 2008, 6:37pm

Take a look how to enable PAE: http://www.microsoft.com/whdc/system/platf…PAE/PAEdrv.mspx

Btw, I stand corrected. On XP, Microsoft placed a limit of 4GB physical ram. (If you turn on PAE, the OS will see 4GB instead of 2 or 3). But on x32 server OSs, you can see more. Also, this won’t let an application use more than 3GB. But it lets the system use the extra ram for useful things.

If you could address all available memory, you wouldn’t need cudaMemcpy(), you’d just write straight to device pointers. But since you need cudaMemcpy(), you can have a host pointer 0x12345678 and a device pointer 0x12345678 (and three more such device pointers belonging to three other CUDA contexts).

alex_dubinsky · November 21, 2008, 9:31pm

I researched this some more, and actually the question of “how can you have 16GB of GPU memory on a 32bit OS” makes a lot of sense. Indeed, although you’re not allowed to access device memory from the host, all of the GPU’s memory is in fact mapped into the OS and available to drivers.

What I discovered is that the only way a 32bit OS can access 16GB video card memory is to turn on PAE in the first place. This enables 64bit page table entries in the 32bit OS.

Also, whether going 32bit PAE or 64bit, the chipset itself must support wide addressing. All Opteron/Athlon 64 chipsets do, but only Intel chipsets that are newer than 975X, P965, and 955X support the capability.

See here: [url=“The system memory that is reported in the System Information dialog box in Windows Vista is less than you expect if 4 GB of RAM is installed”]http://support.microsoft.com/kb/929605/en-us[/url]

Topic		Replies	Views
CUDA hardware & software CUDA Programming and Performance	9	2795	November 13, 2010
Shopping-list for Cuda GPGPU System in 800-1000 euro price-range Goal: A 'budget' GTX 470 (F CUDA Programming and Performance	59	12427	April 15, 2010
board recommendation / headless dedicated / chipset tradeoffs CUDA Programming and Performance	18	10835	July 3, 2009
building the best CUDA machine what hardware should be used? CUDA Programming and Performance	5	28605	March 12, 2007
Four 8800GTX on a single mainboard CUDA Programming and Performance	15	19875	December 10, 2007
Server Motherboards for mulit-GPU systems (&Fermi) CUDA Programming and Performance	26	21468	November 12, 2009
CUDA with4 cards GTX 480 CUDA Programming and Performance	25	15155	September 29, 2010
Hardware Recommendations Recs for hardware for GTX 275 or 285 on Linux CUDA Programming and Performance	20	24387	January 13, 2010
GPU+CUDA cards on Fedora Core 8 Which cards work? CUDA Programming and Performance	10	12184	August 22, 2008
Using mobo built in video? CUDA Programming and Performance	14	13607	April 19, 2008

What do I need for a 4 GPU CUDA Setup?

Related topics