Yes, the GeForce cards are excellent and inexpensive ways to learn CUDA. The high end GeForce cards (GTX 470/480) are comparable in performance to the Tesla cards (with one caveat), and even slightly faster for some things.
The current Tesla series (C2050 and C2070) differ from the GeForce GTX 400-series cards in a few ways, listed here:
http://forums.nvidia.com/index.php?showtop…amp;pid=1073830
The key differences are more GPU memory, 4x faster double precision performance (but similar single precision performance to GeForce), option to use ECC with GPU memory and better quality assurance testing.
For learning/evaluating CUDA, these differences usually are not important. If your code is limited by double precision performance, then you’ll have to keep in mind the 4x difference between GeForce GTX 470/480 and Tesla C2050/2070.
Every time (well, almost) NVIDIA adds a new set of hardware features to the architecture, they increase the compute capability number. These new features sometimes increase performance of existing code, and sometimes they just open up the possibility of doing a new class of computation efficiently on the GPU. For the most part, compute capabilities build on each other, so that each contains all the features of previous compute capabilities.
To give you an idea of how this works, a brief and non-exhausive summary of the compute capability history looks something like this:
Compute capability 1.0: Original CUDA architecture
1.1: Atomic operations in global memory
1.2: Smarter memory controller with relaxed rules for good memory bandwidth, atomic operations in shared memory
1.3: Native double precision
2.0: L1/L2 on-chip cache, concurrent kernel execution, major changes to eventually support all of C++ in device code
2.1: Rebalancing of CUDA cores per multiprocessor and changes in how instructions are scheduled (unlike the previous updates, this update seems to have been made to reduce the costs of the compute capability 2.0 design and does not appear to improve performance at all)