CUDA hardware level: Streaming Multiprocessor

Hello!
I have a question about threads, blocks and warps in SM.

Say it’s true or false and correct me.

When we launch programm cpu+gpu, our kernel runs on every threads on blocks. And (1) every block fully runs on a SM; (2) by hardware blocks on SMs run as warps.
(3) But warp is the quantum of SM, => and it turns out that (*?) in SM runs one warp with one block?
Or may be, in SM threads runs in warps (32 thread), but many (=N) blocks contains a N/32 warps and runs serially?

Threads in a grid are grouped into blocks.
Threads in a block are grouped into warps (each group of 32 threads in a block is a warp).
Thread blocks (with some exceptions) once launched on a particular SM, remain there.

The unit of execution scheduling within an SM is by warp, not by thread.

The programming guide contains useful information:

http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#programming-model

http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#hardware-implementation