Register count is doubled (this is huge.). Shared atomics. Double precision support.
Much more versatile coalescing rules. Support for more threads in flight. Great memory bandwidth. fused multiply-add.
It’s a big deal… the G200 architecture improvements were really tuned more for CUDA than graphics!
Two 8800GTs may be faster, but there’s no SLI in CUDA. You’ll have to program them separately, which can be a bitch.
Anyway, the difference between GTX260 and the older gen isn’t “huge.” It’s nice. It’s critical if you need that new atomics feature. But otherwise if power consumption is what guides you, don’t worry about it.
But really, are you putting this into production? If not, then who cares if you have the fastest card for development. If yes, then you should know what performance you need, and you should test to see what you get. And cooling and reliability would be big factors too.