Scheduler concept inside FERMI

I need a confirmation about one fact. In the page 10 of Fermi white paper, we can see an diagram concept of scheduler. If we take in consideration it, each instruction rectangle represent a package of instructions for each CUDA core? Or, it’s a instruction package for one CUDA Core at the time? I’m trying to figure out what it happen exactly.

Each rectangle represents 16 copies of the same instruction sent to 16 cores within the same SM.

Because there are 32 cores in the SM, both schedulers can send a total of 32 instructions to cores at the same time.

To quote the programming guide: “At every instruction issue time, each scheduler issues:
 One instruction for devices of compute capability 2.0,
 Two instructions for devices of compute capability 2.1,
for some warp that is ready to execute, if any. The first scheduler is in charge of the warps with an odd ID and the second scheduler is in charge of the warps with an even ID. Note that when a scheduler issues a double-precision floating-point instruction, the other scheduler cannot issue any instruction. A warp scheduler can issue an instruction to only half of the CUDA cores. To execute an instruction for all threads of a warp, a warp scheduler must therefore issue the instruction over two clock cycles for an integer or floating-point arithmetic instruction.”

I understand! Thank you very much!