Two 8800 GTX cards with Intel Core 2 Duo would this work?

nogradi · March 14, 2007, 7:06pm

Does anyone have experiences with having two 8800 GTX cards in an Intel Core 2 Duo box and running CUDA applications on them?

I have a gpgpu application (sparse matrix multiplication and some linear algebra) and was wondering if I could run one card from one core and the other card from the other core in such a way that the performance on the two cards is around twice the performance of an individual card.

Note that at first I don’t want to have communication between the cards (that will come later) only to run two separate instances of the same application.

What would be the best setup for this?

seibert · March 14, 2007, 7:17pm

The programming guide in section 4.5.2.1 describes the cudaSetDevice() function, which lets you decide which device each thread is going to talk to. As it mentions in 4.5.2 a given host thread can only talk to one GPU at a time, so to access both cards from a single app, you would need to spawn another thread and call cudaSetDevice() from each before doing anything else. With two separate apps, you just need to call cudaSetDevice() with a different value in each instance of the application.

Incidentally, this brings up a question I have for the experts: Is there an easy way to discover which CUDA device is not already in use when the application starts? Eventually, we will have two cards in one machine (like nogradi is looking into) to run separate applications. It would be very handy if the application could automatically initialize whichever card is free when it starts, and bail out with an error if both cards are in use.

nogradi · March 14, 2007, 7:46pm

Thanks for the reply, it seems then that running two separate applications is pretty simple. And if one only wants to have communication between cards through CPU memory shared by the two CPU cores (probably this is the only option) then two threads each talking to its own card can be a good solution. Did you try any of this with success?

seibert · March 14, 2007, 8:17pm

We only have one card for testing at the moment. (We are waiting for 64-bit drivers before getting a second card.) I have only used CUDA with a multithreaded app in order to do some CPU calculation in parallel with the GPU execution, but not two GPUs at once.

One guy has built a few quad core machines with 3 GPUs in each case. Scroll down in this thread:

http://forums.nvidia.com/index.php?showtopic=30063

nogradi · March 14, 2007, 8:56pm

This sounds cool, pretty much what I would want to do, thanks for the pointer.

eelsen · March 14, 2007, 11:34pm

The programming guide in section 4.5.2.1 describes the cudaSetDevice() function, which lets you decide which device each thread is going to talk to. As it mentions in 4.5.2 a given host thread can only talk to one GPU at a time, so to access both cards from a single app, you would need to spawn another thread and call cudaSetDevice() from each before doing anything else. With two separate apps, you just need to call cudaSetDevice() with a different value in each instance of the application.

Incidentally, this brings up a question I have for the experts: Is there an easy way to discover which CUDA device is not already in use when the application starts? Eventually, we will have two cards in one machine (like nogradi is looking into) to run separate applications. It would be very handy if the application could automatically initialize whichever card is free when it starts, and bail out with an error if both cards are in use.

[snapback]171453[/snapback]

I think you can use cuDeviceGetCount to see if there any available devices for execution and then use cuDeviceGet to get a handle to one of them and then use cudaSetDevice.

See section 4.5.3.2

tachyon_john · March 16, 2007, 6:53am

Hi,
I gave a talk earlier this week which included a bit of discussion of the the 3 GPU tests I’ve been doing via multithreading towards the very end. You can read the current version of my talk slides here:
[url=“ECE 498AL Guest Lecture Materials”]http://www.ks.uiuc.edu/Research/vmd/projects/ece498/lecture/[/url]

One of my multi-GPU implementations is already in the latest VMD source code, you can either get it out of CVS or by reading the Doxygen source listings here:
[url=“Using CVS to retrieve the VMD source code”]http://www.ks.uiuc.edu/Research/vmd/doxygen/cvsget.html[/url]

Cheers,
John Stone

tachyon_john · March 16, 2007, 7:09am

Here’s the direct link to one of the multithreaded CUDA kernels in VMD. The VMD threads code wraps the normal pthreads functions, so you ought to be able to mentally change “vmd_thread” to “pthread_” and understand my code:

http://www.ks.uiuc.edu/Research/vmd/doxyge…8cu-source.html

Cheers,

John Stone

nogradi · March 16, 2007, 7:58pm

John, thanks very much for these links! The whole thing is just amazing.

nogradi · March 16, 2007, 8:23pm

John, I guess more than 3 cards went through your hands in relation to this project and would like to ask about your experiences with memory failures.

When we used 110 cards (7900 GTX) in 110 nodes (no communication between nodes or cards) with OpenGL + Cg we observed a disturbingly high number of memory failures. From the first shipping around 30-40 had problems. These were sent back to Gigabyte, we received new ones, some of those also had problems, sent back, get new ones, and the whole process took 3-4 iterations.

So I was wondering if you had experiences with a large number of 8800 GTX cards if you’ve seen any memory failures.

tachyon_john · March 16, 2007, 8:43pm

Hi,

Are your memory problems actually hardware, or could these be driver or kernel glitches that just happen to corrupt memory? With the complexity of software these days, it wouldn’t surprise me if a linux or windows kernel or driver bug could manifest itself in terms of memory corruption.

I don’t have any data on memory failures for really large numbers of cards. We have something on the order of 65 NV cards in our lab. 80% of them are GeForce 6800s, the other 20% are 7900s and 8800s. Out of all of those cards, I think we’ve had one hardware failure in the last 3 years, if I recall correctly. The vast majority of these cards are being used for visualization with VMD, where the framebuffer memory ends up holding large volumetric texture maps for electrostatic potential maps, density maps, or other large volumetric data, and for high resolution multisample or stereo display modes. I’d been waiting for CUDA to come out before seriously going after GPGPU since I already had enough fun debugging complex shaders and didn’t want to be subject to the whims of shader compilers for doing real scientific arithmetic. At present we’re only planning on using CUDA for this stuff, which means only the GeForce 8800 class cards are going to get pounded for GPGPU arithmetic. If you guys are having issues with memory reliability, I think that you may want to think about going for the Quadro series cards for really long running computations where that’s more important. My understanding is that NVIDIA tests and certifies the Quadro series hardware themselves, whereas the GeForce hardware is tested by the vendor/brand (e.g. Gigabyte), presumably with less stringency. I think that the extra testing is one of the reasons that the Quadro hardware is priced higher. I’m sure that someone else knows much more about all this than I do though, so don’t take my answer as even remotely definitive, you should probably ask the NVIDIA guys about this specifically.

John

seibert · March 16, 2007, 8:58pm

When you (nogradi) say “memory failure”, what do you mean? Was there an actual reported error by the driver, or were you just getting silently corrupted results?

I ask because I found this post by mhouston (one of the BrookGPU developers) kind of unsettling:

http://www.gpgpu.org/forums/viewtopic.php?p=15105#15105

Which is followed by the comment:

I’m just wondering if I should add “random GPU memory corruption” to the list of things that keep me up at night. External Media

tachyon_john · March 16, 2007, 9:12pm

Unless the whole system is protected with ECC, the error rate for any long running computation can be prettyscary. Ebay had to replace a bunch of CPUs in their huge Sun servers many years ago due to cosmic ray hits corrupting data. The chips had ECC in all but one tiny place in the CPU, and of course that was the cause of their problems. I think they were detecting corruption at a rate of once per month or two, as I vaguely recall. The one good thing about the GPUs is that they run so much faster than the CPUs that the time component of the equation is hopefully very short :-)

John

nogradi · March 16, 2007, 10:37pm

John, seibert, this memory issue with the 7900’s was discussed in great detail here, if you are interested in what was exactly happening:

[url=“http://www.gpgpu.org/forums/viewtopic.php?t=2559&postdays=0&postorder=asc&start=0”]http://www.gpgpu.org/forums/viewtopic.php?...der=asc&start=0[/url]

I’m not worrying about these cards anymore what I was wondering if the same issues would arrise with 8800’s. So far we have a couple of those and they seem fine.

John, I guess if you use the cards for visualization only you will never see a problem because nobody can notice if a bit in one of the color components of a pixel is flipped. But we use it for gpgpu stuff where it really matters.

tachyon_john · March 17, 2007, 4:55am

Interesting discussion. Did you ever write a more sophisticated test code to determine if you had boards that were giving errors on particular memory cells, or if it was entirely random? If you had hardware faults, I would expect that a pattern would begin to emerge. With CUDA it should be far easier to write various test program akin to cpuburn and memtest86, which are handy tools for testing cluster nodes before using them for real science. If you don’t want to spend the bucks for certified cards e.g. Quadro, you could add code to do consistency checks or memory block checksumming periodically as calculations progress, and checkpoint/restart as needed. Wait till the first petascale supercomputers come online, I can’t even imagine what their initial MTBF rates are going to be… :)

John

nogradi · March 17, 2007, 1:50pm

We didn’t write anything more sophisticated, mainly because I don’t know how to write a good tool. Any pointers on “consistency checks or memory block checksumming”? I agree it would make sense to write one for CUDA, as you say memtest is very useful, we actually run that for a day on every node before any serious calculation. Something similar for CUDA would be useful for a lot of people I guess.

nogradi · March 17, 2007, 2:13pm

Actually, this post on gpgpu also discusses the need for a GPU version of memtest:

[url=“http://www.gpgpu.org/forums/viewtopic.php?t=3853”]http://www.gpgpu.org/forums/viewtopic.php?t=3853[/url]

tachyon_john · March 17, 2007, 7:02pm

There are many checksum algorithms out there, some are better than others, and some are probably infinitely better suited for GPU implementation. You can get some basic background here:

http://en.wikipedia.org/wiki/Checksum

For memory testing, memtest86 has to jump through a lot of hoops to defeat CPU caches, and then runs various pattern sequences to find bad address lines and/or bad memory cells. I bet that some of the memtest86 routines could be adapted for use on a GPU with some work. The GPU vendors must certainly have tools like this already. Until the advent of GPGPU they would have had no reason to make them available outside their engineering labs. Even now, they may not want to release their internal tools since such tools often have very device specific code in them. I bet that a small group of people could put together some “gpuburn” or “memtestgpu” type tools for CUDA without too much effort. I’d do it myself but I’m already swamped with other things. If nobody else takes it up, maybe I’ll do it in a month or so when I’m finished with my other more pressing commitments.

John

nogradi · March 18, 2007, 11:49pm

If you (or anyone) writes such a tool it would be really nice if it was posted on these forums as well :)

Sam_Adams · October 2, 2007, 10:07pm

I am using two GeForce 8800 Ultras on a Core 2 Quad just fine. You just have to have a thread per GPU

Topic		Replies	Views
Using more than 1 CUDA card at a time. Physics simulations flat out flying on GPU CUDA Programming and Performance	12	12573	March 12, 2010
four 9800GX2 cards: will it work? CUDA Programming and Performance	33	23371	May 28, 2008
Speed problem on 295 gtx cards CUDA Programming and Performance	19	10537	January 8, 2010
Bios usage of dual cards CUDA Programming and Performance	18	5148	July 16, 2014
CUDA hardware & software CUDA Programming and Performance	9	2679	November 13, 2010
Several H/W related Questions CUDA Programming and Performance	12	10777	September 21, 2009
Kernels launch - parallel or serial? CUDA Programming and Performance	16	6899	January 11, 2010
Advice on first CUDA system CUDA Programming and Performance	13	2708	July 7, 2009
Problem using the two cards concurrently Using 2 cards as the only one CUDA Programming and Performance	2	1248	September 28, 2009
CUDA Screen freeze with 1 graphics Card CUDA Programming and Performance	37	51960	June 17, 2011

Two 8800 GTX cards with Intel Core 2 Duo would this work?

Related topics