Well, not quite happy to say this, but for compute-bound kernels 4870 (not 4870X2) is about 30% faster than GTX280. GTX295 is certainly faster than single 4870 but still much slower than 4870X2. Again, these figures are for compute-bound kernel. Here are some actual benchmarks (higher is better):
GTX280 - 11800
GTX285 - 12500
GTX295 - 21700
4870 - 15750
4870X2 - 31000
Problem with ATI is their software and API. You’ll need to spend a lot of time to make things work, and then some more time to make them work fast. CUDA is much more developer-friendly.
Also, I’m not aware of many ATI Stream-enabled products, so if you plan to use third-party software CUDA seems to be better choice.