I am planning a system for CUDA. Is it possible to use a GPU card (8800GTS to be specific) on a motherboard with built in VGA? I would like to use the built in VGA for the host monitor and not use the video output of the GPU at all; set the primary surface memory allocation to zero and have all the GPU memory available. If it matters, I will be using Fedora 7 x86_64. This is probably a trivial question but my knowledge of GPU hardware is almost nonexistant. Thanks for any input.
The short answer is yes. The motherboard BIOS controls whether the onboard is completely disabled when an external GPU is used, or available in addition to the external GPU.
OK, so the onboard video is available in addition to the external GPU. But, does the external GPU neccessarily use a portion of its memory for the primary surface or can this be disabled making all the GPU memory available for the computational GPU programs?
If any video card is not used for display, the memory should be all free for GPU programs. But note that some mobo have crappy BIOS support that does not allow you to use onboard video when an external GPU is plugged in.
Also under Windows, it’s not possible to have different drivers for different GPU (well, i mean drivers from different venders…however if the built-in GPU is also from NIVDIA, you may be able to use your GPUs because of the unified driver architecture implementation) Anyway, Linux should be fine to have different video driver installed in the same time. :rolleyes:
A good example of this would be those used in some Dell motherboards. At some point, I wanted to do the same thing: use on-board video as video and GPU as CUDA-only device. I was never able to coerce the BIOS into allowing this.
So beware of blindly assuming that it will just work.
Seconded. My Dell machine’s BIOS even allows for using the onboard video - or so it says - but it won’t boot unless a monitor is hooked up to the NVIDIA card.
Thanks to all for the info. The original question seems moot. When I search for motherboards with 2 or 3 PCIe 16 slots they never have built in video.
This brings up another question: is there advantage to getting PCIe 16 2.0 over PCIe 16 1.0? So far, the best I have found is EVGA nForce 780i SLI motherboard which has two PCIe 16 2.0 and one PCIe 16 1.0. Would I run into problems if I ever tried three 8800GTS GPU’s with this board? Are there any motherboards out there with 3 or 4 PCIe 16 2.0 slots? I will start with one but I like to keep my hardware for a while. This will be the fourth system I have put together since my 8088/8087.
One last question: DDR3 vs DDR2. All the specs on i/o bus speed vs latency makes my head spin. Obviously I am not a hardware guru. Has anyone seen improved performance to justify the huge added expense?
My application may affect some of these hardware decisions. The application I am interested in is calculating eigenvalues of autocorrelation matrices (Karhunen Loeve Transform) for use in weak radio signal detection. This, of course, is after I learn to efficiently program the GPU.
The 8800 GTS 512 MB (not 320 or 640 MB which use an older chip) will use the extra bandwidth that PCI Express 2.0 offers. Testing in this thread showed that with PCIe 2.0 the limiting factor can be the memory bandwidth of the motherboard:
However, different motherboards varied quite a bit in host-to-device bandwidth, so it might be hard to optimize for this.
There are now motherboards with 4 PCIe 2.0 x16 slots. For example, the MSI K9A2 Platinum motherboard with the AMD 790FX chipset has 4 such slots. Keep in mind that double wide cards like the 8800 GTS have pretty high power and physical space requirements, so getting 4 of them to work in a single case could be a challenge. (I’m still hoping to see a successful 4 card setup reported in the forum.) Three card setups have been done by several people. Just make sure you have good cooling and a sturdy PSU.
Another thing to keep in mind is that you will want at least as many CPU cores as you have GPUs. So if you plan to go to 3 or 4 CUDA devices, you’ll want a motherboard that supports a quad core CPU as well.
I can’t speak to the DDR2 vs DDR3 question, but the few times where I’ve had a choice between spending the $$ on more memory vs. faster memory, purchasing more memory has served me better in the long run. But I work with large datasets that grow to fill the capacity of my available computers, so YMMV. Feeding multiple GPUs with data in your situation might require you to maximize the total system memory bandwidth.
This board only runs 16x + 16x OR 8x + 8x + 8x + 8x.
Good catch. That’s weird, though. I remember researching this board before and I’d swear it was x16 all the way. I’m surprised I missed this.
Well, in that case, I don’t know of a 4 slot, x16 PCI 2.0 motherboard. :)
I did some more thinking about this, and x8 PCIe 2.0 might not be so bad. Each PCIe 2.0 lane has double the bandwidth of a PCIe 1.0 lane. So a x8 PCIe 2.0 slot should perform the same as a x16 PCIe 1.0 slot. Moreover, if you are pushing data to multiple cards at the same time, the bottleneck will probably be your system memory rather than the PCIe bandwidth.
Anyway, more food for thought as you plan your system.
This board, Tyan Tempest i5400PW (S5397), has two x16 slots, integrated video. Also takes a truckload of RAM.
It recognises two CUDA cards and boots with them. I will confirm that integrated video and CUDA works when I have tried it.
I don’t know about DDR3. My naive opinion is that since I haven’t really seen any dazzled reviewers clutching DDR3 sticks, it’s not worth the cost. If it did, we’d have heard about it.
Personally, I could use lower memory latency rather than more bandwidth.
Interesting board. A little pricey. Can you run it with only one CPU? FWIW one of the reviews at Newegg said it will not boot with Tesla C870 card installed. What specific GPU cards are you using?
I’m not sure if you can use a single CPU. I guess you could - the board isn’t arranged in two separate banks, so it looks like a standard dual-CPU board, which in my experience will run with a single chip. That said, an equivalent single-socket board does probably exist, and will probably be cheaper.
We’ve got a couple GTS 512 cards.
Can’t say about integrated video, but CUDA works fine. Bandwidth is very nice:
Host to Device Bandwidth for Pinned memory
Transfer Size (Bytes) Bandwidth(MB/s)
Device to Host Bandwidth for Pinned memory
Transfer Size (Bytes) Bandwidth(MB/s)