spec for C870? cannot found detail on nvidia website

I saw the spec of C870 on the nvidia website. It only says it has 128 multiprocessors, versus 16 in 8800 cards. What about the rest, like shared memory, registers, constant memory, etc?

With the current CUDA model, I feel it is hard to fully utilize the power of even 16 multiprocessors, not to mention 128. Because it seems that only one multiprocessor can run at a moment?

I think you are confused about some of the terminology. The 8800 GTX and C870 both contain 16 multiprocessors, and each multiprocessor contains 8 processors, giving a total of 128 processors for either device. The C870 has 1.5 GB of global memory, whereas the 8800 GTX has 768 MB. Other than that, both devices are pretty much identical. Clock rates may differ slightly since the various manufacturers of GTX cards can set different clock rates.

All multiprocessors on a device can be used at the same time, as long as you start your kernel with at least as many blocks as the device has multiprocessors.