board recommendation / headless dedicated / chipset tradeoffs

I am an absolute CUDA beginner, in fact I lack the necessary board at this point. I have windows vista/ubuntu 9.04 (both 64 bit) and only a single pciex16 slot with no alternative monitor out. I have several very newbie questions:

  1. Does the actual board the nvidia chip is mounted on make any difference when running cuda stuff, i.e. there are a variety of boards by different manufacturers (PNY, sparkle, evga, msi …) does the board on which the chip sits make any difference or does it just run procedures off the nvidia chip for image processing and will be effectively invisible to someone using cuda (unless it is driving a monitor, in which case I am guessing different boards may be more or less “hogs” with regard to nvidia chip resources).

  2. Is it possible in vista or ubuntu (obivously running headless and logging in remotely) to make a single nvidia card available for cuda computation, but not for drawing to the monitor, so it is a dedicated card.

  3. It appears that the geforce gtx 285 is a good computational choice (somewhat cheaper than the 295), but for a newbie it is hard to tell how big the tradeoffs are and I don’t think the guy at best buy is going to know. :) Is the 295 effectively like having two 285s (minus a bit) that are unrelated? I am assuming they don’t even share global memory since memory bandwidth is listed as 448 bit per gpu(?).

-John Robertson

You have to check if the board fits to your computer. A lot of high end cards models you are referring to are consuming a lot of power

and your computer have to handle 200W more. To use a high end GPU boards with max capacity the PCIe bus is better to run with max speed too.

Not all motherboards listed as PCIe capable can run at the max PCIex16 speed. for example we have a very powerful Dell Precision 670 but

the PCIe is running at half speed. This will determine how fast you can transfer the data back and forth the GPU which is the slowest operation.

I personally choose the middle professional Quadro FX 1800 from PNY because it has a good mix of memory ~800MB, processors - 64, price ~$400,

computing capability 1.1, consumes only ~40W and will not burn your computer. Installed memory is very important factor because you can

transfer your data in GPU memory, do all computations there and get them back in CPU data when needed, so more GPU memory the better.

Better buy from NewEgg - the best deals and support I found. The pack comes to your desk for 1 day max.

The only differences I’ve ever seen between card makers is technical support, aesthetics, and cooling solutions when dealing with overclocked configurations. (But you should probably stay away from overclocked cards with CUDA.) They all run the same CUDA drivers, though.

In Ubuntu, the way you would do this is configure your system not to start X at boot. You also need to run a script given in the CUDA release notes to create the /dev/nvidia* devices (which X would normally do for you). I don’t know much about Vista, but it sounds like you would need a second video card to run the display.

The GTX 295 is like having two separate GTX 275s, at a lower clock rate, sharing the same PCI-Express connection. No memory is shared between the cards, and you have to treat them as separated devices in CUDA, which complicates coding. Since they share the same PCI-E connection, if you copy data to both devices simultaneously, you only get half the maximum PCI-E bandwidth to each device.

I would stick with the GTX 285 or 275 unless you want to play with multi-GPU programming. thstart does have a good point about space and power. Almost all of the GTX 200 series cards are two slots wide, full length, and require two 6-pin PCI-E power connectors. The GTX 280 and 295 require a 6-pin and an 8-pin plug, which is even a little harder to find.

In the world of single slot cards, the 9800 GT is my favorite. It only requires one 6-pin PCI-E power connection. The only downside is that it is a compute 1.1 device, so no double precision and the memory controller is not quite as smart with uncoalesced reads as the compute 1.3 devices.

I am running Win 7 with FX 1800. It is also connected to the display and seems to not needing a second video card - why you would need one?

This pretty much explains why simpler single GPU solutions are better. I believe if the next cards are produced on 45nm process they would be more power efficient and it would be possible to fit more transistors in the chip.

I was evaluating the possibility to use Tesla in the beginning and to use also the developers’ discount. But it turned out I need a new computer in order to fit the card in my system which I am happy with until now. Practically it needed a completely new power supply and I don’t wanted to risk spending a lot of time making it work. After I decided to get the FX 1800 It turned out the 16xPCIe on my machine was not communicating at full speed with FX 1800. Delving deeper in Dell 670 documentation I see now them mentioning something about PCIe working as 8xPCIe but you can see this actually after you get the card and installing it.

9800 GT - 112 cores/512MB/256 bit memory interface/57.6 GB/sec Memory Bandwidth

FX 1800 - 64 cores/768MB/192 bit memory interface/38.4 GB/sec Memory Bandwidth

I choose FX 1800 because needed more memory. After that my tests was showing that actually not

all of this memory was available for programming, possibly because Win 7 is using the GPU too.

My feeling is that with more memory it is better because once you get your data in the device memory

and manage to work only with these data, the PCIe bottleneck is not so much of issue.

The original poster specified that he did not want the card to be shared with the graphical display. The only two ways to do that are either to drop to a text console or put in a second card.

Why he would need that? What are the benefits? Possibly when it is not shared with the display he can get the max memory available which seems Win 7 to share with?

You can’t preempt a running kernel to run graphics on the GPU, and you can’t run both at the same time (the card is timeslicing between compute and graphics). With a dedicated compute card, you can run kernels that last for five, ten minutes without any issues.

I hope we will see the next NVIDIA cards to be a dedicated computation cards like Tesla minus the special power needs. I don’t know if it is hard to strip down the display part to make them cheaper but it would definitely be a winner. I would like to put such a card on servers like Dell 8250 which do not have PCIe, just PCI-X. It would not work on the max interface speed but with a proper programming and choosing a card with more memory I can store all my data in memory and only occasionally get the results back. With dedicated card and no timeslicing we I can live with a slower interface between CPU and GPU. This would be better than to invest in a new servers with all associated costs.

NVIDIA card would get much more penetration and customers if makes this possible.

Doesn’t make sense. PCI-X is on the way out in favor of PCIe, the max interface speed would be pathetic, it would make chips more expensive (because they’d need to support PCI-X and PCIe), etc.

Not only that, but codes run faster when on a non display GPU. I’ve confused myself several times the last several days comparing benchmark runs on display GPUs to other on non display GPUs. Its very confusing when I run knowingly faster code and it comes out slower :) At least with HOOMD, the performance hit is very noticeable (more than 3-4%) with KDE running on the same GPU.

Context switching is not a lightweight operation.

(why yes, I am in a meeting and constantly refreshing the forums)

I understand that but investing in a new machine is not a better alternative from customers side. For us developers it doesn’t matter much - we will invest in a new machine because we don’t need a lot of machines, sometimes just 1. But it matters for customers - how much they have to spend in time and money in order to get our software working.

It is a matter of ROI (Return of Investment) - a new machine means money and time. Money for new hardware, a new OS and other system software licenses, time to make all this running, to pay somebody to do it.

It is a hard sell to convince them to buy:

  1. a new NVIDIA card (some of them already own professional and expensive NVIDIA cards but not CUDA capable)

  2. a new computer supporting NVIDIA card to the full capacity

  3. our software.

They can understand 1) after some reservation why they need to get another NVIDIA card after they already spend thousands of $'s for a professional card like a high end Quadro cards. I get a questions like - what to do with the old one, which is working OK but is not CUDA capable?

For me the cost of 2) would be eliminated or postponed if there is an alternative to PCe no matter how fast, at least it will work. I would recommend to them a card fitting their current power supply - NVIDIA already has a lot of models to choose from. But the PCIe interface is the problem. If there is at least some temporary workaround to fit this card on old PCI just to show the things are working, they can be convinced to invest in a new machine and in the best NVIDIA card.

And I believe with the right mix of programming and a card with more memory it still would be faster than a pure CPU solution. They can begin with a very cheap GPU card of such type and then when see the performance difference can be convinced to invest in better GPU and better computer.

Still the question remains - if they already have a tens of computers and no one has a PCIe probably it would be better for NVIDIA if they can buy a tens a cheaper GPI cards for all of them rather not buying at all or buying only a few new machines and only a few NVIDIA cards.

I would like to do that too but it needs two PCIe slots. My Dell 670 is having only one PCIe interface. How many motherboards are having more than one?

I’m not sure about the custom motherboards used by vendors like Dell and HP, but most motherboards you buy directly from places like Newegg have at least two PCI-Express slots these days. Slightly nicer boards have 3, and a handful have 4. However, in the >2 slot case, if you plug in more than two cards, the speed of slots starts to drop from x16 to x8, since I think most all chipsets top out at 32 or 36 PCI-Express lanes.

2 slots are OK - one for display another for computing.

I tried also the following on another machine of mine - Lenovo with embedded GPU from Intel with one available PCIe, tested the FX 1800 on it but cannot make it work - if I plug in the card in the PCIe slot, the display attached to the embedded one goes blank. When I remove the CUDA card, it works. Then I attached the display to the CUDA card, but the display is blank again. So this is another issue - a possible incompatibility with embedded cards.

That’s just a BIOS issue, not anything NVIDIA or CUDA-specific.

Thank you very much. Your advice is very helpful, I had no idea power supply was an issue. Dell tech support now claims my machine shouldn’t have more than a 360 watt power supply (which it comes with), which I am not sure makes any sense, but that would limit me to a 9500 gt or less (yikes)

I do not understand why the quadro fx 1800 is recommended. Not that I disagree, I just am not understanding something. Its specs are OK, but it seems to be a $400 card with specs on the same scale as many $100 cards.

Is the issue that when a manufacturer says they are sellling a geforce 9500 gt with 1 gig of memory, it is really a regular geforce 9500 gt with only 256 to 512 mb of memory, but some extra memory is on board the card, and not integral with the chipset?

-John Robertson

None of the CUDA capable GPUs have integrated DRAM. The only difference between the boards with differing amounts or types of RAM is the specification and size of the DRAM chips on the board.

I selected FX 1800 because it has more memory and the right mix of features.