Geforce VS Quadro for Molecular Dinamics in CUDA Which is better for molecular dinamics?

I am interested in obtain a graphic card in order to use de CUDA technology but I don’t know if it is better to get a Geforce or Quadro. This is because sometimes the Geforce has more streaming processors (Geforce GTX 295 has 420 cuda cores) and they are cheaper than the Quadro. I hope you can help me to understand this. Thank’s

Omar

Depends.

GeForce is cheaper, usually over clocked

Quadro has more OpenGL features (GL affinity) and support for better anti aliasing, high dynamic range, etc.

OpenGL drivers for the quadro are better, cuda is the same.

GeForce is a consumer product, Quadro is a Professional product in terms of target market.

If you want to start playing around with Cuda you should probably start with GeForce. If you want to build a server you should probably get a tesla. If you want a render machine you want a quadro.

You should also keep in mind that the GTX 295 has two GPUs, each with 240 stream processors. CUDA does not automatically divide the work between the two GPUs, so you have to spawn a host thread for each GPU and use them independently. It’s not a big deal (assuming your problem can be split over two cards), but you should be aware of it.

The GT240 ($100), GTX 275 ($250), or GTX 470 (~$350-$375) would all be good starter cards, depending on your budget. The GT240 is compute capability 1.2 and lacks double precision support. The GTX 275 has double precision support (compute capability 1.3), and more stream processors. The GTX 470 is compute capability 2.0, so it has all the fancy new features NVIDIA added, like on-chip memory caches, super-fast atomics, and promised support for all of C++ in a future CUDA toolkit update.

Thank you both for respond. Is better to have two GPU in separate PCI-E than to have the GTX 295?? Now I have the posibility to acquire a GTX 295 but I have been reading that there are some problems with CUDA on this card. Is there a way to make it run on CUDA without problems with all the streem processors?? If it is possible to make it running OK, are there a tutorial or something like that explaining the way to do it? Thank you very much for your help,

Omar

It behaves as two independent cards as far as cuda is concerned. That is it has 2x240 cores, not 480, so you need to program for two cards to fully utilize all cores.

Another issue is that both GPU share one PCIe bus, so you may have a communications bottleneck.

On the other hand, a lot of cheaper consumer boards with two PCIe x 16 slots share on controller for both slots, so you will also get 2x8 lanes (that is 8 for each card) instead of 16. In that case I don’t know if both options are the same or whether the 295 can utilize all 16 lanes if you are using just one card.

On a board with two PCI-e controllers you will see better speed from two cards.

In any case, you need to make sure that your power supply can feed this monster or you may start seeing random and unexplained cuda errors due to fluctuating power supply.

Hi

Since your title says you want to use it for Molecular Dynamics simulations I’ll give you some infos in that regard. I’am the main developer of LAMMPS_CUDA (its a branch of the central developing efforts to port LAMMPS to the GPU see http://code.google.com/p/gpulammps/wiki/Lammps_cuda for more infos). For this (and other Cuda Codes) it doesnt matter whether you get a Quadro or a Geforce card, as long as the number of cores (and the various frequencies) are equal. But for a Quadro with 240 Cores you have to pay a lot of money.

Also the 295GTX has distinct advantages since you basically have two GPUs with 240 Cores each (and >100GB/s Memory access each). Most (at least our) GPU MD Codes support multiple GPU modes. If your systems are large enough it also scales quiet well. Depending on the Code the PCIe bus is not much of a limiting factor as well. For example 2 GTX295 in one of our testmachines outperform 4 Tesla 1060 (on one board, with each having full PCIe16x). If you want to do visualisiation Quadro cards can be significantly faster than Geforce cards, but it depends on the software used. And I don’t know if there is any MD visualisation software which really runs much faster on a Quadro. One more thing: The only Quadro Card as fast as a GTX 295 in CUDA (or better as fast as one half of the 295GTX) is the 5800 FX which costs around 3000€. If you have that much money you might better invest that into Tesla C2050 which is the new Fermi Version of the Tesla cards. For half as much money you might also think about a Tesla C1060 or a FX 4800 (1600€).The 4800 is comparable to a 275GTX, while a 295GTX is basically two 275GTX on one board.

So to summarize: If you dont have 1500€+ to spend on the GPU get a 295GTX or as someone else suggested a 470GTX or 480GTX which are not as fast as a 295GTX in existing MultiGPU Codes but might provide a lot of benefits in the future, when codes start to use the new features.
If money plays no role you might look into a Tesla C2050 if you only want to run simulations or a FX 5800 (which is slower) if you want to do visualisation which needs the newest OpenGL Versions.

Best regards
Ceearem

As for memory transfers, did you clock just the memory transfers or complete performance. What is the bus speeds? Check that the 295 is not using two slots at x16 and 4 teslas at x8 (depends on how many actual PCIe switches there are on board).

It’s not really “fair” to compare performance of the quadro fx5800 the gtx295 (or gtx285). The fx5800 is clocked the same as the tesla. The gtx285 I have is clocked 30% higher, as well as most I ran into. The 295 is usually clocked a little bit lower due to greater thermal issues, but still usually significantly higher than the tesla and quadro.

The quadro and the tesla have 4GB of memory, the 295 1GB per core, you can get the 285 with either 1 or 2 GB (one GPU, not two as in the 295). OpenGL is faster on the quadro and you can run OpenGL not on the primary cards with the quadro (gl_affinity). The geforce will run opengl only on the primary card (unless using SLI which may not play well with cuda, at least before v3.0, which is not relevant for multiple scenes). You need to watch the cooling more with geforces as well due to the usually higher clock.

On the other hand if geforce will do the job, it will run faster due to over clocking (like I said 30% faster on my 285 vs c1060) and is significantly cheaper.

In any case I would go with 2 cards to allow hardware debugging

Then I said memory transfers don’tplay much of a roll I spoke specifically about our LAMMPS_CUDA code refering to wall clock time (which is after all the important measure for end users). Their are enough PCIe switches on all our machines to serve all cards with full bandwidth (the 4 Tesla cards are in a supermicro board with 2 Xeon CPUs and 2x36PCie Lanes).

Yeah I know that comparable (say same number of Cores / width of memory access) Quadros/Tesla are significant slower than the Geforce cards. In fact these clock differences go almost 1 to 1 into our MD code performance. Thus a 280GTX (which is just a little bit slower than a 285) is roughly 25% faster than the Tesla, and (half) a 295GTX roughly 15% faster than the Tesla. I still think its a relatively fair comparison, since they are all using more or less the same chip.

This is only necessary if you plan to sit in front of the computer. Most MD codes are run on linux machines, and most of that again on remote servers. If you want to sit in front of the machines one should really add another GPU for the Display, but a very cheap one will do.

Ceearem