I did my first CUDA application (LZ77 compression) and my results on 64-bit is
2xCPU (3.8GHz): 1,3 MB/s (32-bit 0,76MB/s)
1 GPU: 3,1 MB/s
2 GPU: 5 MB/s
3 GPU: 6,8 MB/s
For my first CUDA application it is quite good, my CPU code on 32-bit is much more slower.
Code for CPU is not very optimized and uses similar approach as GPU code.
I also tried to naively port same CPU code for GPU and on GTX275 it was the same speed as on dual core so it is also good.
Only problem is TDR for these naive applications, which I consider as bug - TDR for non display adapters otherwise GPU could be simply used as additional processor.