How many concurrently running threads

I have a 8800 GTS 640 MB.
How many threads can this card
run concurrently, that they are really running
in parallel, regardless whether tey are in a
thread block or not.

The CUDA programming guide
says that a 8800 GTS has
12 Multiprocessors with 16 CPUs each.
So I think the number is 384 threads.

But I think that there are
other limitations in the warp size
so it is only 256.

In appendix A of the programming guide (page 76), there are all the info you are looking for.
Each multiprocessor can run up to 768 threads, but the maximum number of threads per block is 512 ( so you need to run multiple blocks to fill a multiprocessor)