PCIe 16x wired as 8x effect on card use (gtx280, gtx295, C1060)

Riser with PCIe x16 wired as x8 (1), PCI-X (1), Dell Precision R5400 (430-3122)

We are aiming to test a few different cards in a Dell R5400 and the T line.

gtx295 (assuming this is also a double precision card)
Tesla C1060

Wondered if anyone had any issues. According to the charts, all these cards should be compatible with the Del servers - but I just noticed this issue.


it is no problem if the x16 slot is just connected thru x8. it will only reduce your host-device bandwidth.
all 3 cards mentioned are basically the same, just c1060 is missing the display ports and instead has a bunch more memory. gtx285 on the other hand is also the same chip, just shrunk in size with a new manufacturing process (meaning: less power consumption/heat at same speed, higher overclockable) everything else apart from slightly higher clock rates is the same as in gtx280. gtx295 is two cards somewhere inbetween 260 and 280 but produced with the manufacturing process of 285 “glued” together, so both cards share the same port.
all of them support double precision.

Another thing to keep in mind is that the two GPUs in the gtx295 share the same PCI-Express connector. This means that whatever the bandwidth of your bus is, each GPU will only get half if you are accessing them at the same time (as you are likely to be doing if you’ve split your workload over both cards).

If the c1069 does not have display ports, what would a good complementary card be to take care of display? (OS Linux Redhat 5x)

Well, anything that is driven by the same driver. Taking the same generation is a sure way to have he same driver till the end of life, so gtx260?

that’s where it gets ugly - we need more power.

if you don’t need a graphics card to show “nice” 3d graphics, you can just go with the cheapest 8 or 9 series card, you can find. e.g., the low-budget 8 series only uses like 40W and a 9400 is at 50W.
you should also be able to find at least some 8 series as pci (not pci-e) version, if you don’t want to occupy a pci-e slot.

Of course, if you don’t need the 4 GB of memory in a C1060, the easiest solution is to install a single GTX 280 (err 285 I guess now) instead. It is the same size, uses the same power connectors, and even runs a little faster. :)

Do not know CUDA enough yet to determine the loss of 4GB as far as programmability - but as a general statement more memory cannot hurt.

Having a rough time locating a dealer that can ship 285 or 295 on short notice.

True, but more memory bandwidth can also not hurt (the GTX 280 memory is clocked higher than a Tesla), so you’ll have to pick. :)

To first order, the memory you use in CUDA will be similar to a CPU version of your code. As you make your algorithm more CUDA-friendly, you tend to use less memory, not more, since many CUDA programs are memory bandwidth limited, rather than FLOPS limited.

Don’t get me wrong: I’m not saying you shouldn’t purchase a Tesla C1060, but you should be aware that the Tesla compute devices are not unequivocally the best solution for all CUDA tasks. There are tradeoffs.

Another option instead of the 285 would be to look at the discounted GTX 280s:


Ignoring the rebate, the cards are starting at $315. (Newegg also has the GTX 295 in stock for $500, but no 285 yet.)

yeap - like programming for BlueGene Power Chips and the new Cell - going backwards -):

I don’t understand what you mean by power. Is a gtx260 not enough for your display purposes (calculating on C1060), or does a gtx260 take too much power next to the C1060?

In general - we just need solve three problems:

a, get a set of workstations that have enough power to power either of these cards for CUDA use. (obviously the gtx has video so that is good)

b, get a set of servers for our clusters that again can accommodate the full length and double height cards + have enough power to power them.

c, they need to be double precision cards.

Since the s1070 is selfcontained - we are okay with that. But it will take care of only two servers and they are expensive to roll it out on 20 + machines.

It is not about the computational or graphics “power” of the cards. Unfortunately it is about real power and PCI form factor. We would be happy with a small starter card - but all double precisions are high end.

Right now we are just thinking large form factor workstations and amending the power supply with a modular unit like:


and rack them. Not elegant but workable.


The lowest-power G200 card will be, according to Wikipedia, a new 55nm GTX 260 Core 216 that’s being released now. (There is no new name for it.)

That, I think, is the best solution. For example, you don’t have to worry about sizing up the PSU (which is not too easy. Besides estimating the power consumption of all the other components, you must see how the demand is split among the PSU’s rails). It might also be more reliable.