CUDA hardware & software

CDan · November 12, 2010, 8:07pm

Hi,
we want to buy a new CUDA machine for our lab.

Our current machine has an ASUS SUPERCOMPUTER motherboard and we were never able to run 4 cards on it. Trying to do so resulted in a crash after several minutes of starting a CUDA program. It is true that we tried this half a year ago, maybe with the new drivers this problem is solved. Also, we have different cards: 280, 285 and 480. They work in any combinations, but no more than 3 together. Poor cooling is another problem when running cards packed together. Does anyone know why 4 cards are not running properly? The PSU is big enough, Thoughpower 1500W.
Having problems with the ASUS SC motherboard, we want to use anoter MB, i.e. Asus Rampage III Extreme, which is even 20% cheaper. Does anyone have experience with this MB? It will work with 4 GPUs?
What do you think it is better: one system with 4 GPUs or 2 with 2 GPUs? The prices seem to be almost identical. In case of systems with 2 GPUs, do you know any good MBs that have the PCIE slots distanced enough to have one emply slot between GPUs? Is there any advantage in having an onboard GPU to be used for display?
Having more than one CUDA computer requires software to use them in a cluster. What will be the optimum solution in terms of OS and additional software?
Do 4 GTX 580 work in a single system?

Thanks.

CDan · November 12, 2010, 8:07pm

Hi,
we want to buy a new CUDA machine for our lab.

Our current machine has an ASUS SUPERCOMPUTER motherboard and we were never able to run 4 cards on it. Trying to do so resulted in a crash after several minutes of starting a CUDA program. It is true that we tried this half a year ago, maybe with the new drivers this problem is solved. Also, we have different cards: 280, 285 and 480. They work in any combinations, but no more than 3 together. Poor cooling is another problem when running cards packed together. Does anyone know why 4 cards are not running properly? The PSU is big enough, Thoughpower 1500W.
Having problems with the ASUS SC motherboard, we want to use anoter MB, i.e. Asus Rampage III Extreme, which is even 20% cheaper. Does anyone have experience with this MB? It will work with 4 GPUs?
What do you think it is better: one system with 4 GPUs or 2 with 2 GPUs? The prices seem to be almost identical. In case of systems with 2 GPUs, do you know any good MBs that have the PCIE slots distanced enough to have one emply slot between GPUs? Is there any advantage in having an onboard GPU to be used for display?
Having more than one CUDA computer requires software to use them in a cluster. What will be the optimum solution in terms of OS and additional software?
Do 4 GTX 580 work in a single system?

Thanks.

seibert · November 13, 2010, 3:23am

Out of curiosity, which OS were you using? I haven’t had any trouble with Ubuntu Linux 64-bit 9.10 or 10.04 with the Asus P6T7 Supercomputer MB and three GTX 295s + one GTX 470.

Is your application easily divided into independent data sets? You can optimistically move data between GPUs in the same computer at 2-3 GB/sec, whereas you will at most hit 100 MB/sec over gigabit ethernet. (Plus there’s a huge latency difference.) If the computers don’t need constant communication between processes, then this doesn’t matter and 2 computers with 2 GPUs will be much easier to build and maintain.

I use Sun Grid Engine with as many queue slots as CUDA devices and set the devices to compute exclusive mode. I don’t use MPI for anything, so I don’t have any opinions about that.

seibert · November 13, 2010, 3:23am

Out of curiosity, which OS were you using? I haven’t had any trouble with Ubuntu Linux 64-bit 9.10 or 10.04 with the Asus P6T7 Supercomputer MB and three GTX 295s + one GTX 470.

Is your application easily divided into independent data sets? You can optimistically move data between GPUs in the same computer at 2-3 GB/sec, whereas you will at most hit 100 MB/sec over gigabit ethernet. (Plus there’s a huge latency difference.) If the computers don’t need constant communication between processes, then this doesn’t matter and 2 computers with 2 GPUs will be much easier to build and maintain.

I use Sun Grid Engine with as many queue slots as CUDA devices and set the devices to compute exclusive mode. I don’t use MPI for anything, so I don’t have any opinions about that.

CDan · November 13, 2010, 8:55am

The current machine has Win 7 Pro 64 bit. We are thinking to use Linux when we will have more than one. I have no experience with Win Server which I think is required if we want to build a Win cluster.

Our application uses only one GPU, the comunication latency between different GPUs beeing too big for the algorithm. We need as many GPUs as possible to run hundreds of experiments, compute standard deviations etc. Each run requires 1-200h, depending on the task complexity. I thougt having a 4GPU machine is easier to maintain if there are no cooling problems. As I read in many reviews the new 580 is cooler than 480 and has a better air intake even if the GPUs are closed packed.

In case of computers with only 2 GPUs there are MBs with GPU onboard. Do you think it is better to have this option in order to completely free the discrete cards for CUDA? If yes, what CPU/chipset should have an Intel platform? The current processors with on chip GPU are confusing me with regard of the PCIE lanes. For AMD things are clear, but their CPU cores are 10-40% slower and our application needs a fast CPU too. I don’t want to overclock because we run the machine 24/7 for months.

I thought “compute exclusive mode” is only for Teslas. Can this be used to Geforce cards?

CDan · November 13, 2010, 8:55am

The current machine has Win 7 Pro 64 bit. We are thinking to use Linux when we will have more than one. I have no experience with Win Server which I think is required if we want to build a Win cluster.

Our application uses only one GPU, the comunication latency between different GPUs beeing too big for the algorithm. We need as many GPUs as possible to run hundreds of experiments, compute standard deviations etc. Each run requires 1-200h, depending on the task complexity. I thougt having a 4GPU machine is easier to maintain if there are no cooling problems. As I read in many reviews the new 580 is cooler than 480 and has a better air intake even if the GPUs are closed packed.

In case of computers with only 2 GPUs there are MBs with GPU onboard. Do you think it is better to have this option in order to completely free the discrete cards for CUDA? If yes, what CPU/chipset should have an Intel platform? The current processors with on chip GPU are confusing me with regard of the PCIE lanes. For AMD things are clear, but their CPU cores are 10-40% slower and our application needs a fast CPU too. I don’t want to overclock because we run the machine 24/7 for months.

I thought “compute exclusive mode” is only for Teslas. Can this be used to Geforce cards?

seibert · November 13, 2010, 2:03pm

Our application uses only one GPU, the comunication latency between different GPUs beeing too big for the algorithm. We need as many GPUs as possible to run hundreds of experiments, compute standard deviations etc. Each run requires 1-200h, depending on the task complexity. I thougt having a 4GPU machine is easier to maintain if there are no cooling problems. As I read in many reviews the new 580 is cooler than 480 and has a better air intake even if the GPUs are closed packed.

In case of computers with only 2 GPUs there are MBs with GPU onboard. Do you think it is better to have this option in order to completely free the discrete cards for CUDA? If yes, what CPU/chipset should have an Intel platform? The current processors with on chip GPU are confusing me with regard of the PCIE lanes. For AMD things are clear, but their CPU cores are 10-40% slower and our application needs a fast CPU too. I don’t want to overclock because we run the machine 24/7 for months.

Yeah, if you can get the 4 GPU computer to run, you will probably be fine. If you want to do multi-GPU with Intel, you should go with a socket LGA-1366 and the X58 chipset. (Xeon vs. Core i7 depends on your preference for ECC host memory.) Stay away from the socket LGA-1156, as it has much less PCI-E bandwidth. In addition, you probably want to make sure you populate the memory slots in the triple channel configuration for maximum memory bandwidth to support the multiple GPUs.

Regarding the on-board GPU, it depends on your OS and usage pattern. Personally, I don’t need the onboard GPU because on Linux, if you turn off the GUI entirely, then the primary GPU is free to run CUDA jobs with no watchdog timer. I run all of my tasks on the server over SSH from the laptop on my desk. If you are going to use Windows, then you might want a display-only GPU if you intend to use the computer directly (as opposed to using it as a remote server), or if your kernels run long enough that you are in danger of triggering the watchdog.

You are probably thinking of the Windows driver for Tesla that lets you run the card independent of the display system. On Linux (which is the only OS I have CUDA experience with), you can run the nvidia-smi tool with GeForce or Tesla to set each GPU in one of three modes: normal, compute exclusive, and compute prohibited. If you then don’t specify a CUDA device in your code (assuming runtime API here), the context will be created on the first allowed device. So, if you set compute exclusive on all the GPUs, you get automatic distribution of jobs across the devices.

seibert · November 13, 2010, 2:03pm

Our application uses only one GPU, the comunication latency between different GPUs beeing too big for the algorithm. We need as many GPUs as possible to run hundreds of experiments, compute standard deviations etc. Each run requires 1-200h, depending on the task complexity. I thougt having a 4GPU machine is easier to maintain if there are no cooling problems. As I read in many reviews the new 580 is cooler than 480 and has a better air intake even if the GPUs are closed packed.

In case of computers with only 2 GPUs there are MBs with GPU onboard. Do you think it is better to have this option in order to completely free the discrete cards for CUDA? If yes, what CPU/chipset should have an Intel platform? The current processors with on chip GPU are confusing me with regard of the PCIE lanes. For AMD things are clear, but their CPU cores are 10-40% slower and our application needs a fast CPU too. I don’t want to overclock because we run the machine 24/7 for months.

Yeah, if you can get the 4 GPU computer to run, you will probably be fine. If you want to do multi-GPU with Intel, you should go with a socket LGA-1366 and the X58 chipset. (Xeon vs. Core i7 depends on your preference for ECC host memory.) Stay away from the socket LGA-1156, as it has much less PCI-E bandwidth. In addition, you probably want to make sure you populate the memory slots in the triple channel configuration for maximum memory bandwidth to support the multiple GPUs.

Regarding the on-board GPU, it depends on your OS and usage pattern. Personally, I don’t need the onboard GPU because on Linux, if you turn off the GUI entirely, then the primary GPU is free to run CUDA jobs with no watchdog timer. I run all of my tasks on the server over SSH from the laptop on my desk. If you are going to use Windows, then you might want a display-only GPU if you intend to use the computer directly (as opposed to using it as a remote server), or if your kernels run long enough that you are in danger of triggering the watchdog.

You are probably thinking of the Windows driver for Tesla that lets you run the card independent of the display system. On Linux (which is the only OS I have CUDA experience with), you can run the nvidia-smi tool with GeForce or Tesla to set each GPU in one of three modes: normal, compute exclusive, and compute prohibited. If you then don’t specify a CUDA device in your code (assuming runtime API here), the context will be created on the first allowed device. So, if you set compute exclusive on all the GPUs, you get automatic distribution of jobs across the devices.

CDan · November 13, 2010, 5:07pm

Thanks a lot for the info. We will try Linux and Sun Grid Engine.

Yes, for a 4 GPU computer the only Intel platform with enough PCIE lanes is X58. 3x4GB is enough for our applications and we don’t need ECC. All I want is maximum computing power for a reasonable amount of money.

CDan · November 13, 2010, 5:07pm

Thanks a lot for the info. We will try Linux and Sun Grid Engine.

Yes, for a 4 GPU computer the only Intel platform with enough PCIE lanes is X58. 3x4GB is enough for our applications and we don’t need ECC. All I want is maximum computing power for a reasonable amount of money.

Topic		Replies	Views
Advice on first CUDA system CUDA Programming and Performance	13	2674	July 7, 2009
Server Motherboards for mulit-GPU systems (&Fermi) CUDA Programming and Performance	26	21025	November 12, 2009
advice needed by a PhD student CUDA Programming and Performance	26	2831	December 4, 2011
Modern GPU CUDA Programming and Performance	30	5632	April 11, 2016
300x to 600x times faster... really? CUDA Programming and Performance	92	34357	February 8, 2010
GTX 590 CUDA power tests CUDA Programming and Performance	40	10094	January 29, 2012
CUDA Laptop A discussion on Benefit-Cost Ratio. CUDA Programming and Performance	42	37257	July 2, 2009
newbie questions CUDA Programming and Performance	14	1864	September 24, 2010
Using more than 1 CUDA card at a time. Physics simulations flat out flying on GPU CUDA Programming and Performance	12	12530	March 12, 2010
Looking for CUDA apps that can use more than 1 GPU. CUDA Programming and Performance	41	12953	December 9, 2009

CUDA hardware & software

Related topics