I usually tell people looking to jump into CUDA development to start with a GeForce card. Something like the GTX 470 is only $270, has the same power requirements as the Tesla C2050, and let’s you figure out where CUDA can benefit your work with minimal investment. Then, once you have working CUDA programs in front of you, you can better evaluate whether the additional features of the Tesla are necessary.
Although there are many subtle differences between GeForce and Tesla, the two biggest Tesla features are higher performance double precision (by a factor of 4 per multiprocessor clock compared to GTX 470/480) and 3 or 6 GB of device memory. Without CUDA experience, it can be hard to tell ahead of time if you will need either of these things. So a ~$300 investment up front can help you decide if you need to spend $2500 later. And, if CUDA turns out to be not applicable to your computing needs, you have lost much less capital. (CUDA is awesome, but it doesn’t do everything.)
One reason I specifically push the GeForce for initial benchmarking is that many people assume that since they plan to use double precision, the Tesla must be faster. However, unlike most CPU applications, a significant fraction of CUDA programs are ultimately limited by device memory bandwidth or latency. (Feeding 448 CUDA cores with non-trivial operands can easily saturate even the fastest memory bus.) In these situations, the slower double precision performance of the GeForce might have a negligible effect of actual runtime. With a working CUDA program in hand, you can analyze it more carefully to decide if memory speed is the barrier, whereas this can be hard to estimate ahead of time.