Need Hardware suggestions Is the GTX295 a good choice as GPGPU?

hi all,

i have now a quadro fx1700 graphics card and after some orientations in cuda i want to buy a new one for computation using GPU. In the Appendix of Programming guide (2.3.1) there is a table of performances of different nvida cards. It seems the GTX295 is one of the best cuda enabled devices. But I have still some questions :rolleyes: :

  • Do the three brands of nvidia play any roll for choosing a GPGPU? Is that necessary to buy a Telsa series card? (it is very expensive…)
  • Is the GTX295 a good choice as GPGPU? Are there some else graphics cards better than it in this area with a economical cost (for example Quadro FX 5800)?

sincerely wish

As everything in life, there are pros and cons for every card.


pros: 4GB, life supporting quality, better QA, suitable for production, support from nVidia…

cons: expensive, less GPUs per machine.


pros: a bit faster than the tesla (one half vs C1060), cheaper, can put 3 or 4 GTX295 in a desktop machine - giving you 8 GPUs.

cons: a bit less than 1GB RAM, not life supporting quality, less qa

It really depends on what you want to do. For desktop - probably the GTX295 is the best. For production or algorithms that must

use 4GB RAM go with the Tesla (either C1060s or S1070)


thx for your reply! Why cant I put more Tesla cards on one desktop? I thought the c1060 is a simple combination of GeForce 200 chips on one graphics card.

Yes but for desktop, you’d probably have at max 4 PCI lanes - either you can put a C1060 in it or GTX295.

If you put 4 C1060 you get a total of 4 GPUs, if you put 4 GTX295 you’ll get 8 GPUs.

As a side note, the C1060 doesn’t have video out so you need another GPU for your display, so either

one GTX/quadro and three C1060 or try to find some board with onboard-CUDA-capable card.

The question is what you try to achieve with this system…


clear explain! Thank u!

i compare the prices of the c1060 and the gtx295 currently:

The GTX 295 costs about $500 and the Tesla 1060 costs more than $1200, even though the 295 card has two Geforce 200 chips. I can’t really understand…

It seems the costs depend just on the ram…

The cost depends on the segment. If you buy a quadro card you are mostly paying mor for the same thing :)

thx for ur reply. Can u explan it particularly please? Why did u mind with a quadro card i can play more for the same thing.

Maybe i haven’t written my requirement here clearly: i need a good graphics card for parallel computation of “normal” algrithms, not for playing games…

If you have some time you can wait for a Fermi based Geforce card. Meanwhile you can buy a “small” GT200 based card or simply wait.

It depends if you use this card for development purposes or for running the algorithms in a “big scale”.

I would even suggest to use 2 graphics card and dedicate one for CUDA and the other for the display output. With the GT200 you need a second card to use hardware debugging and since the emulation support will be discontinued with the release of CUDA 3.0 you will need it to debug your kernels.

The next thing is that you have to develope your programs explicitly to use 2 graphics cards (GTX 295). It all depends on your needs.

Thx for ur reply! I’ve read the website of nvidia fermi. Is the definition of “cuda cores” identical with Multiprocessor? (1 Multiprocessor has 8 Processors and a GTX295 has 2*30 MP). Then the fermi is a great evolution.

“Cuda cores” means the processing elements on a “streaming multiprocessor” SM.

On the GT200 you have 30 streaming multiprocessors with 8 cuda cores on each SM, this equals 240 cuda cores.

On Fermi (GF100) you have 16 streaming multiprocessors with 32 cuda cores - 512 cuda cores.

I’m already waiting for the GF100, at least for detailed specs or new release of the CUDA programming guide, that includes the GF100 architecture. In my opinion it will be be a great card for GPGPU. Hardware debugging support with one GPU, Cache, dual DMA-Engine, fast double precision, fast integer support (fast modulo, multiplication and division), C++ support, 64-Bit memory architecture.

I also assume that some shortcomings of the current architecture will be corrected, like 3D grids, texture writes, 3D linear memory, additional texture adressing and probably much more.

One thing is missing until now, that would bring more companies to use CUDA in their programs. An official compiler that translates PTX-to-x86 - like Ocelot, but with Nvidia support and for all platforms. It would avoid developing the same algorithms for x86 and for CUDA.

Today I found an interesting paper, which shows that Nvidia is probably also interested in this topic

[url=“”]PLANG: PTX Frontend for LLVM -

Vinod Grover joint work with Andrew Kerr and Sean Lee[/url]

thx for ur info.

what did u mean with that? Why will the emulation support be discontinued with 3.0? How can screen one of the cards for video output and how can I use the second card for hardware debugging?

many questions…