Should I buy Tesla or GTX295

I was planning to buy several tesla cards as NVIDIA’s mad science offer seemed to be promising Fermi cards for the price difference but this is not the case.

So I’m left with the option of Tesla C1060 cards for £620 each
OR GTX295 cards for £330 each

I’m using them for CUDA programming of large neural networks and real time visual feature extraction so I don’t care about gaming performance or graphics output.

I can see that the Tesla has double the memory and the GTX has double the cores (split into 2 devices).

I’m looking for the fastest option which would seem to be the GTX, I hope there is enough memory there?
I could even mix and match, 2 tesla and 2 gtx in one machine?

What’s your advice?

Another thing to consider is the GTX 285. It runs at a faster clock rate than the 295, and doesn’t require an 8-pin PCI-Express power connector. The 285 is the direct analog of Tesla, just with 1 GB of memory instead of 4 GB.

As for the memory size question, you are the only one who can answer that. How much data do you need to load onto the card to do your calculation?

Not so much memory need I think… also I’ve found this topic suggesting problems with 4 GTX295’s and windows (http://forums.nvidia.com/index.php?showtopic=101930)… not sure if this applies to linux too?

I run four GTX 295 cards on a P6T7 motherboard with Ubuntu 9.10 64-bit (not an officially approved distribution, had to install gcc 4.3 for nvcc) and I have run the same configuration with RHEL 5.3. Both cases worked fine.

Basically, the best CUDA device configuration depends on the answers to these questions:

  • Is my algorithm easy to divide between multiple devices with minimal (or no) exchange of data between them? Will the performance scaling be approximately linear in the number of devices?

If not, you would be better off with the fastest single device, the GTX 285. If multiple GPU operation would be beneficial, then the GTX 295 might be worth considering (or just more GTX 285s).

  • How much memory in each device do I need to hold the data I will be operating on?

Each GTX 295 half gets 896 MB, the GTX 285 comes in 1 and 2 GB models, and the Tesla is 4 GB.

  • How much data do I need to send over the PCI-Express bus? How often do I need to do this?

If the answer is “a lot of data very frequently”, then the GTX 295s will not be very helpful as each CUDA device competes with the other one for the x16 link. Moreover, unless you get a very unusual motherboard, you will max out the bandwidth at two devices. Beyond that point, you will be sharing a fixed amount of PCI-Express bandwidth with more and more CUDA devices. For bus-bound kernels, a pair of GTX 285s or Teslas would be best.

  • Do I have the patience to deal with the lower quality assurance of the GeForce series? (Or conversely: Do I have the budget to afford the greater quality assurance of the Tesla series?)

I’ve used about a dozen GeForce cards since CUDA was released, and had none dead-on-arrival. One card died after a year of use. I know of no other failure stats for GeForce or Tesla cards. You might be able to get Tesla failure rates if you sign an NDA, but to be honest they probably won’t be of much use to you unless you are planning a large deployment.

If you aren’t sure what the answers to these questions are (and they are sometimes hard to figure out without something to play with), then you might want to build a solid workstation with just one GTX 285. If things look good, then you can add Tesla/285/295 cards as needed, and if things don’t work, then you just have a nice workstation with a good graphics card. :)

I have generally been happy with my GTX 295 (now running approx one month)
However there is something odd with the second device which I have never tracked down.
See http://forums.nvidia.com/index.php?showtopic=153407

Bill
ps: CIGPU 2010 dead line in a few weeks
http://www.cs.ucl.ac.uk/external/W.Langdon/cigpu

Ok I think I’m getting a clearer idea of the differences, in part the problem is that the machines will be used for several different tasks with quite different requirements.

  1. they will be used for artificial evolution and physics simulation requiring quite a bit memory but infrequent bus activity… for this maybe the tesla is best with large memory.

  2. they will be used to control a humanoid robot in real time providing various image processing algorithms simultaneously, auditory processing, and various neural nets and other control systems… thus requiring quite a bit of communication over the bus (robot to cpu to gpu, and gpu to gpu) and to be honest not excessive amounts of memory, so a GTX285 would do the job.

Probably the tesla is the way to go for now… looking forward to the new Fermi though