I notice that SM contains processing blocks, e.g, there are 2 processing blocks in GP100 SM. There are 32 CUDA core in one processing block. Is processing block equal to warp?
Warp is a CUDA software terminology and describes a group of 32 consecutive threads that execute the same instruction simultaneously (this is true until Pascal - Volta changed this concept somewhat).
Processing block is hardware terminology (e.g. found in the GP100 white paper).
They are not exactly equivalent.
At any given time or clock cycle, you can have one warp executing on one processing block. There can be many warps (even from different thread blocks) queued for execution in the instruction schedulers of a multiprocessor. This is done so that the long latency of memory accesses can be worked around and the CUDA cores stay busy doing useful work.