Geforce GTX280 has 240 cores and 30 multiprocessors while GTX460 has 336 cores but only 7 multiprocessors. Obviously GTX460 has many times more cores per multiprocessor than GTX280, but why is it then that GTX460 can have only 50% more threads per multiprocessor than GTX280 can (GTX280’s 1024 threads/MP vs. GTX460’s 1536 threads/MP)? If you multiply the count of multiprocessors by the count of threads per multiprocessor, you get that GTX460 can have only 10752 concurrent threads while GTX280 can have almost three times that (30720 concurrent threads)? Have I understood something wrong or what is the problem?
GTX280 is a GT200 architecture GPU, whereas GTX 460 is a Fermi architecture GPU. Multiprocessors of different architectures are different, see the CUDA Programming Guide for details.
Yes, I understand that they are different, but is it true that you can have three times more concurrent threads on GTX280 than you can on GTX460? If it is, then doesn’t that make it also three times faster?
No. The maximum number of active or concurrent threads doesn’t say much, if anything, about the speed of the cards.
You’re correct on the number of concurrent threads on the two architectures. However, as avidday pointed out, the max number of concurrent threads is not an indicator of performance. Once you have enough threads to hide latency (where “enough” depends on the kernel), performance is a function of memory or instruction throughput.