CUDA hardware & software

Hi,
we want to buy a new CUDA machine for our lab.

  1. Our current machine has an ASUS SUPERCOMPUTER motherboard and we were never able to run 4 cards on it. Trying to do so resulted in a crash after several minutes of starting a CUDA program. It is true that we tried this half a year ago, maybe with the new drivers this problem is solved. Also, we have different cards: 280, 285 and 480. They work in any combinations, but no more than 3 together. Poor cooling is another problem when running cards packed together. Does anyone know why 4 cards are not running properly? The PSU is big enough, Thoughpower 1500W.

  2. Having problems with the ASUS SC motherboard, we want to use anoter MB, i.e. Asus Rampage III Extreme, which is even 20% cheaper. Does anyone have experience with this MB? It will work with 4 GPUs?

  3. What do you think it is better: one system with 4 GPUs or 2 with 2 GPUs? The prices seem to be almost identical. In case of systems with 2 GPUs, do you know any good MBs that have the PCIE slots distanced enough to have one emply slot between GPUs? Is there any advantage in having an onboard GPU to be used for display?

  4. Having more than one CUDA computer requires software to use them in a cluster. What will be the optimum solution in terms of OS and additional software?

  5. Do 4 GTX 580 work in a single system?

Thanks.

Hi,
we want to buy a new CUDA machine for our lab.

  1. Our current machine has an ASUS SUPERCOMPUTER motherboard and we were never able to run 4 cards on it. Trying to do so resulted in a crash after several minutes of starting a CUDA program. It is true that we tried this half a year ago, maybe with the new drivers this problem is solved. Also, we have different cards: 280, 285 and 480. They work in any combinations, but no more than 3 together. Poor cooling is another problem when running cards packed together. Does anyone know why 4 cards are not running properly? The PSU is big enough, Thoughpower 1500W.

  2. Having problems with the ASUS SC motherboard, we want to use anoter MB, i.e. Asus Rampage III Extreme, which is even 20% cheaper. Does anyone have experience with this MB? It will work with 4 GPUs?

  3. What do you think it is better: one system with 4 GPUs or 2 with 2 GPUs? The prices seem to be almost identical. In case of systems with 2 GPUs, do you know any good MBs that have the PCIE slots distanced enough to have one emply slot between GPUs? Is there any advantage in having an onboard GPU to be used for display?

  4. Having more than one CUDA computer requires software to use them in a cluster. What will be the optimum solution in terms of OS and additional software?

  5. Do 4 GTX 580 work in a single system?

Thanks.

Out of curiosity, which OS were you using? I haven’t had any trouble with Ubuntu Linux 64-bit 9.10 or 10.04 with the Asus P6T7 Supercomputer MB and three GTX 295s + one GTX 470.

Is your application easily divided into independent data sets? You can optimistically move data between GPUs in the same computer at 2-3 GB/sec, whereas you will at most hit 100 MB/sec over gigabit ethernet. (Plus there’s a huge latency difference.) If the computers don’t need constant communication between processes, then this doesn’t matter and 2 computers with 2 GPUs will be much easier to build and maintain.

I use Sun Grid Engine with as many queue slots as CUDA devices and set the devices to compute exclusive mode. I don’t use MPI for anything, so I don’t have any opinions about that.

Out of curiosity, which OS were you using? I haven’t had any trouble with Ubuntu Linux 64-bit 9.10 or 10.04 with the Asus P6T7 Supercomputer MB and three GTX 295s + one GTX 470.

Is your application easily divided into independent data sets? You can optimistically move data between GPUs in the same computer at 2-3 GB/sec, whereas you will at most hit 100 MB/sec over gigabit ethernet. (Plus there’s a huge latency difference.) If the computers don’t need constant communication between processes, then this doesn’t matter and 2 computers with 2 GPUs will be much easier to build and maintain.

I use Sun Grid Engine with as many queue slots as CUDA devices and set the devices to compute exclusive mode. I don’t use MPI for anything, so I don’t have any opinions about that.

  1. The current machine has Win 7 Pro 64 bit. We are thinking to use Linux when we will have more than one. I have no experience with Win Server which I think is required if we want to build a Win cluster.
  1. Our application uses only one GPU, the comunication latency between different GPUs beeing too big for the algorithm. We need as many GPUs as possible to run hundreds of experiments, compute standard deviations etc. Each run requires 1-200h, depending on the task complexity. I thougt having a 4GPU machine is easier to maintain if there are no cooling problems. As I read in many reviews the new 580 is cooler than 480 and has a better air intake even if the GPUs are closed packed.

In case of computers with only 2 GPUs there are MBs with GPU onboard. Do you think it is better to have this option in order to completely free the discrete cards for CUDA? If yes, what CPU/chipset should have an Intel platform? The current processors with on chip GPU are confusing me with regard of the PCIE lanes. For AMD things are clear, but their CPU cores are 10-40% slower and our application needs a fast CPU too. I don’t want to overclock because we run the machine 24/7 for months.

  1. I thought “compute exclusive mode” is only for Teslas. Can this be used to Geforce cards?
  1. The current machine has Win 7 Pro 64 bit. We are thinking to use Linux when we will have more than one. I have no experience with Win Server which I think is required if we want to build a Win cluster.
  1. Our application uses only one GPU, the comunication latency between different GPUs beeing too big for the algorithm. We need as many GPUs as possible to run hundreds of experiments, compute standard deviations etc. Each run requires 1-200h, depending on the task complexity. I thougt having a 4GPU machine is easier to maintain if there are no cooling problems. As I read in many reviews the new 580 is cooler than 480 and has a better air intake even if the GPUs are closed packed.

In case of computers with only 2 GPUs there are MBs with GPU onboard. Do you think it is better to have this option in order to completely free the discrete cards for CUDA? If yes, what CPU/chipset should have an Intel platform? The current processors with on chip GPU are confusing me with regard of the PCIE lanes. For AMD things are clear, but their CPU cores are 10-40% slower and our application needs a fast CPU too. I don’t want to overclock because we run the machine 24/7 for months.

  1. I thought “compute exclusive mode” is only for Teslas. Can this be used to Geforce cards?

Yeah, if you can get the 4 GPU computer to run, you will probably be fine. If you want to do multi-GPU with Intel, you should go with a socket LGA-1366 and the X58 chipset. (Xeon vs. Core i7 depends on your preference for ECC host memory.) Stay away from the socket LGA-1156, as it has much less PCI-E bandwidth. In addition, you probably want to make sure you populate the memory slots in the triple channel configuration for maximum memory bandwidth to support the multiple GPUs.

Regarding the on-board GPU, it depends on your OS and usage pattern. Personally, I don’t need the onboard GPU because on Linux, if you turn off the GUI entirely, then the primary GPU is free to run CUDA jobs with no watchdog timer. I run all of my tasks on the server over SSH from the laptop on my desk. If you are going to use Windows, then you might want a display-only GPU if you intend to use the computer directly (as opposed to using it as a remote server), or if your kernels run long enough that you are in danger of triggering the watchdog.

You are probably thinking of the Windows driver for Tesla that lets you run the card independent of the display system. On Linux (which is the only OS I have CUDA experience with), you can run the nvidia-smi tool with GeForce or Tesla to set each GPU in one of three modes: normal, compute exclusive, and compute prohibited. If you then don’t specify a CUDA device in your code (assuming runtime API here), the context will be created on the first allowed device. So, if you set compute exclusive on all the GPUs, you get automatic distribution of jobs across the devices.

Yeah, if you can get the 4 GPU computer to run, you will probably be fine. If you want to do multi-GPU with Intel, you should go with a socket LGA-1366 and the X58 chipset. (Xeon vs. Core i7 depends on your preference for ECC host memory.) Stay away from the socket LGA-1156, as it has much less PCI-E bandwidth. In addition, you probably want to make sure you populate the memory slots in the triple channel configuration for maximum memory bandwidth to support the multiple GPUs.

Regarding the on-board GPU, it depends on your OS and usage pattern. Personally, I don’t need the onboard GPU because on Linux, if you turn off the GUI entirely, then the primary GPU is free to run CUDA jobs with no watchdog timer. I run all of my tasks on the server over SSH from the laptop on my desk. If you are going to use Windows, then you might want a display-only GPU if you intend to use the computer directly (as opposed to using it as a remote server), or if your kernels run long enough that you are in danger of triggering the watchdog.

You are probably thinking of the Windows driver for Tesla that lets you run the card independent of the display system. On Linux (which is the only OS I have CUDA experience with), you can run the nvidia-smi tool with GeForce or Tesla to set each GPU in one of three modes: normal, compute exclusive, and compute prohibited. If you then don’t specify a CUDA device in your code (assuming runtime API here), the context will be created on the first allowed device. So, if you set compute exclusive on all the GPUs, you get automatic distribution of jobs across the devices.

Thanks a lot for the info. We will try Linux and Sun Grid Engine.

Yes, for a 4 GPU computer the only Intel platform with enough PCIE lanes is X58. 3x4GB is enough for our applications and we don’t need ECC. All I want is maximum computing power for a reasonable amount of money.

Thanks a lot for the info. We will try Linux and Sun Grid Engine.

Yes, for a 4 GPU computer the only Intel platform with enough PCIE lanes is X58. 3x4GB is enough for our applications and we don’t need ECC. All I want is maximum computing power for a reasonable amount of money.