Visited with my advisor yesterday, and he’s using CUDA cards with great success for simulations. He’s found a 200x speedup over using the CPU, once all the operations are sent over to the GPU. His claim is that most of the standard benchmarks use a lot more I/O and transfer data more often than his simulations, which just flat out fly on the GPU.
There’s one issue that they are having, though. They can’t get the cards to work together. Machine 1 is small and simple, with one GTX285 (2 GB). Machine 2 has a server motherboard (he claimed 4 full x16 PCI-e 2.0 paths,) and 4 GTX285s (and a honking big case, etc.) The only way they have gotten top performance out of the four cards is to send a separate job to each card. Of course, this would still subject the work to more I/O, but if it’s done cleverly, it would enable bigger simulations.
For those of you working with more than one card, how do you get them to talk to each other? And do you see speed slowdowns relative to using 1 card?
Finally, I was advised that putting a very cheap ($20) video card for video output to free the GTX285 up to just do math avoided the big performance hit for doing math and graphics on the same card. In my mind, this would make using a x16/x16/x4 motherboard for a 2 GPGPU system better than a x16/x8/x8 solution, because you could put the cheap card in the x4 slot and get full bandwidth to the 2 GPGPUs. Of course, your cheap card might be regular PCI, AGP, or whatever, but having 2 dedicated x16 ports makes good sense to me now, and seems to be the sweet spot in price/performance/setup effort.
I regret that I don’t have more specifics, but if anyone out there has approximate answers to these issues, I can get you in touch with the interested parties.