max limit of CUDA prossesors on a system?

My question is perhaps a bit teoretical. But is there a limit to how meny CUDA units you can have in a system and use all for CUDA.
Eks. If there was a Motherbord with eks 8 Pci Express cards) Could i use all of the slots to serve duocore CUDA units (8X295 cards) ?

Also another question, how much reduction in prosessing speed do you get when you use both X16, X8 and X4 slots? (is there a problem
using the X4 pci express slots.

The largest system I’m aware of is four GTX 295 cards in one motherboard (for a total 8 CUDA devices). I don’t know CUDA itself has a hard device limit, but there are many considerations which make increasing this number of CUDA devices difficult in practice:

  • Power to the cards: Supplying power to four GTX 295s hits the limit of single power supplies available on the market (and probably several electrical safety laws). Linking two power supplies together is not hard, and some people are already doing that with four GTX 295 cards. However, you will want multiple power circuits to plug into the wall if you do this.

  • Power to the motherboards: Most (all?) CUDA-capable devices expect to be able to draw 75W of power from the PCI-Express slot. Motherboards advertised as “Quad-SLI” can do this, but large server motherboards with more than 4 PCI-Express slots might not be able to do this.

  • CPU: If you are running many short kernels on your CUDA devices, best performance is usually obtained with one CPU core per GPU. This minimizes the latency between the GPU finishing the kernel and your host thread being alerted to this fact. This is not a hard requirement, and some programs work just fine with fewer CPUs than CUDA devices.

  • PCI-Express bandwidth: You mention this as a possible concern, and it depends again on the type of program you are running. Some CUDA programs need to move a lot of data between the CPU and the GPU, and this will be slower if adding more cards forces all the slots to drop to slower speeds. As long as the slot is physically x16 (and can supply the power as mentioned above), the card should work regardless of whether the slot is electrically x4, x8, or x16. But if your application depends on fast CPU/GPU transfers, then performance will suffer.

  • System memory bandwidth: Communication between the CPU and GPU is done by DMA between the system memory connected to the CPU and the device memory connected to the GPU. If you have a large number of CUDA devices trying to perform memory transfers at the same time, the speed of system memory could be the bottleneck.

  • BIOS bugs: The last problem is the one that no one can predict until they try it. I doubt many motherboard manufacturers test more than 8 CUDA devices, so there is the risk that exceeding this number will trigger some sort of BIOS bug that may or may not be fixed in a timely manner.

It is much safer at this point to have two computers with four GTX 295s than to try to build a monster system with eight.

I doubt you cold fit more than 4 double slot crads onto any motherboard, for a total of 8 GPUs. As far as single-slot, single-unit GPUs go, there’s no motherboard with 8 PCI-E x16 slots. The largest I’m aware of has 6 PCI-E x16 slots, only three of which are x16 electrical. As siebert said, the limit is more how many units one could physically fit, not a limitation of CUDA.
The 295 are dual-slot cards, and you can’t just remove the cooler and put a waterblock to convert it to a single slot. It is made of two boards athat will take the two slots.

Of course, we could take the ASUS P6T6 Revolution, put a single-slot videocard in one of the PCI-E slots (we need one in order for the board to boot), and connect a C1070 system to each of the five other PCI-E slots, for a total of twenty S1060 units, and whatever the GPU is, so we get 21 (twenty-one, not a typo) CUDA units.

That way we eliminate three of the problems seibert mentioned.
First, we don’t need to worry about power to the cards. The C1070s have their individual power supplies, and the one GPU in the system is not going to be a problem.
Second, the C1070s (as far as I know) won’t draw anything from the PCI-E bus, so we don’t haveto worry about drawing obscene amounts of power from PCI-E slots, as only the one GPU will do so.
Third, we don’t have to worry about BIOS bugs, as the BIOS only sees one GPU.

The problem is that the bandwidth limitations are amplified significantly, and it might just not be worth crunching such a system together (unless you have the money and curiosity to build it).

Last but not least, fitting too many cards in one case will get you a toaster oven. The GTX295 throws some of the hot air straight back into the case. Try to cool that!