The grid of blocks is distributed over all multiprocessors in the device. (30, in the case of the GTX 275, 280, and 285)
A block is confined to run on one multiprocessor so that the threads in the block can communicate via the shared memory in that multiprocessor.
No, but a multiprocessor can run more than one block (up to 8) from the same kernel at once. The limit here is the shared memory and register usage of the block. If each block requires 5000 bytes of shared memory and 2304 registers, then 3 blocks will run at a time on each multiprocessor. (In this example, the shared memory is the resource limit.) If there are more blocks than can fit on all the multiprocessors in the device, the extra blocks sit in a queue and wait for active blocks to finish.
Although you can’t run more than one kernel at a time, you can run a “fat kernel” with no penalty. A fat kernel has a big if statement at the beginning that goes into completely different subroutines. As long as the branch depends on block ID number, there is no penalty for this:
__global__ void my_fat_kernel()
{
if (blockIdx.x < 128)
do_something1();
else
do_something2();
}
In fact, you can branch on warp boundaries (threads 0-31 take one branch, threads 32-63 take another branch, and so on), also with no penalty. Of course, this is a really cumbersome way to make the GPU do two different things at once. You also run into the problem that if do_something1() takes more time than do_something2(), you have to wait until both are done before starting another kernel.
Mostly, the fat kernel is to illustrate that the hardware probably could (or is at least very close to) running multiple kernels at once, but there is no software interface to do so. You should probably not try to use the fat kernel technique given its limitations, though. Usually it is better to run kernels sequentially instead.
The ability to truly run multiple kernels at once has been requested in the forums a few times, but we haven’t had any confirmation that NVIDIA is working on it. This could mean anything from “It is impossible” to “You’ll see it in the next release.” NVIDIA doesn’t do roadmaps. :)