Nil
September 28, 2009, 10:08pm
1
Hello Everybody,
I am little inquisitive about the SP cores inside the SMs. In some paper I read about these cores. They say its a 8 width SIMD. Is it true? If so, then in a SM there will be 8x8 (total 64) instructions running together in parallel, because there are 8 SPs inside a SM. I have found in another document that blocks are issued to the SMs in the granularity of warps (a set of 32 threads running same instruction). And at an instant of time there will be only one warp issued to the SM for execution. Now its quite ambiguous to me why 32 parallel instructions are running when it should run 64 instructions in parallel.
I think I have some wrong information or less information about the SP pipeline. If somebody can clarify my doubt on this topic I will be highly obliged.
Thanks,
Nil
yes, it is.
assume that your block has 64 threads, and this block will manipulated by 1 Multiprocessor (SM). 1 SM has 8 cores.
This block will split in to two warp (32 threads for 1 warp), so we need 2 warps for all threads of your block.
the first instruction: 8 cores will manipulate first first 8 threads of first warp.
the second instruction: 8 cores will manipulate first second 8 threads of first warp.
the third instruction: 8 cores will manipulate first the third 8 threads of first warp.
the four instruction: 8 cores will manipulate first the four 8 threads of first warp.
after finish first warp, the second warp will be manipulated.
remember that the meanning of haft warp (1/2 warp= 16 threads ) is useful when using shared memory.
Nil
October 1, 2009, 1:24am
3
Hi Quoc,
Thanks!
Can you provide little detail about this?
According to your reply, the scheduler schedules 1 warp to 1 SM. Then the instruction issue unit issues -
1 instruction for 1st 8 threads of warp 1
1 instruction for 2nd 8 threads of warp 1
1 instruction for 3rd 8 threads of warp 1
1 instruction for 4th 8 threads of warp 1
1 instruction for 1st 8 threads of warp 2
1 instruction for 2nd 8 threads of warp 2
1 instruction for 3rd 8 threads of warp 2
1 instruction for 4th 8 threads of warp 2
2 instruction for 1st 8 threads of warp 1
2 instruction for 2nd 8 threads of warp 1
2 instruction for 3rd 8 threads of warp 1
2 instruction for 4th 8 threads of warp 1
2 instruction for 1st 8 threads of warp 2
2 instruction for 2nd 8 threads of warp 2
2 instruction for 3rd 8 threads of warp 2
2 instruction for 4th 8 threads of warp 2
… continues
This suggests the SP cores are not 8-width SIMD. Instead SISD!
Am I right?
–
Nil
yes, it is.
assume that your block has 64 threads, and this block will manipulated by 1 Multiprocessor (SM). 1 SM has 8 cores.
This block will split in to two warp (32 threads for 1 warp), so we need 2 warps for all threads of your block.
the first instruction: 8 cores will manipulate first first 8 threads of first warp.
the second instruction: 8 cores will manipulate first second 8 threads of first warp.
the third instruction: 8 cores will manipulate first the third 8 threads of first warp.
the four instruction: 8 cores will manipulate first the four 8 threads of first warp.
after finish first warp, the second warp will be manipulated.
remember that the meanning of haft warp (1/2 warp= 16 threads ) is useful when using shared memory.