I use VGA GeForce GTX 850M, I see that has SM = 5, CC 5.0, 640 cores. Maximum thread per block is 1024, maximum block per SM is 32, maximum thread in SM is 2048.
My program calls kernel with thread per block is 16x16. So, num block in SM is 8 (= 2048 / 256), with 5 SMs in device, device can run kernel with 5 SMs * 8 blocks * 256 threads = 10240 thread at the same time.
So What is 640 cores ? It is number of block can run at the same time or number of thread can run at the same time ???
Please help me explain it !
Neither. Cores are FPUs, used by threads for floating-point calculations.
The formulae to calculate the number of blocks/threads being able to run at the same time are more complicated than what you suggest. Various resources need to be taken into account that may limit occupancy.