I saw the spec of C870 on the nvidia website. It only says it has 128 multiprocessors, versus 16 in 8800 cards. What about the rest, like shared memory, registers, constant memory, etc?
With the current CUDA model, I feel it is hard to fully utilize the power of even 16 multiprocessors, not to mention 128. Because it seems that only one multiprocessor can run at a moment?