I am little inquisitive about the SP cores inside the SMs. In some paper I read about these cores. They say its a 8 width SIMD. Is it true? If so, then in a SM there will be 8x8 (total 64) instructions running together in parallel, because there are 8 SPs inside a SM. I have found in another document that blocks are issued to the SMs in the granularity of warps (a set of 32 threads running same instruction). And at an instant of time there will be only one warp issued to the SM for execution. Now its quite ambiguous to me why 32 parallel instructions are running when it should run 64 instructions in parallel.
I think I have some wrong information or less information about the SP pipeline. If somebody can clarify my doubt on this topic I will be highly obliged.