“Cuda cores” means the processing elements on a “streaming multiprocessor” SM.
On the GT200 you have 30 streaming multiprocessors with 8 cuda cores on each SM, this equals 240 cuda cores.
On Fermi (GF100) you have 16 streaming multiprocessors with 32 cuda cores - 512 cuda cores.
I’m already waiting for the GF100, at least for detailed specs or new release of the CUDA programming guide, that includes the GF100 architecture. In my opinion it will be be a great card for GPGPU. Hardware debugging support with one GPU, Cache, dual DMA-Engine, fast double precision, fast integer support (fast modulo, multiplication and division), C++ support, 64-Bit memory architecture.
I also assume that some shortcomings of the current architecture will be corrected, like 3D grids, texture writes, 3D linear memory, additional texture adressing and probably much more.
One thing is missing until now, that would bring more companies to use CUDA in their programs. An official compiler that translates PTX-to-x86 - like Ocelot, but with Nvidia support and for all platforms. It would avoid developing the same algorithms for x86 and for CUDA.
Today I found an interesting paper, which shows that Nvidia is probably also interested in this topic
[url=“http://llvm.org/devmtg/2009-10/Grover_PLANG.pdf”]PLANG: PTX Frontend for LLVM -
Vinod Grover joint work with Andrew Kerr and Sean Lee[/url]