When I enter:
Compute Capability version 5.0
Threads per block 640
Registers per thread 86
Shared memory per block 3
It gives me: Maximum Thread Blocks Per Multiprocessor
Limited by Max Warps / Blocks per Multiprocessor 4
Limited by Registers per Multiprocessor 1
Limited by Shared Memory per Multiprocessor 256
So when my GPU has 5 SM can I use 1 * 5 (SM) = 5 Thread Blocks and 640 Block Size (Threads)?
Yes, that is what that means. You select the lowest limit of blocks per multiprocessor, which in your case is 1.
Again, this is a statement of occupancy. That is how many blocks that can be simultaneously resident on SMs. You can certainly launch a kernel with more blocks than that, but only that many blocks will run “simultaneously”.