Inquisitive about SP cores in SMs

Nil · September 28, 2009, 10:08pm

Hello Everybody,

I am little inquisitive about the SP cores inside the SMs. In some paper I read about these cores. They say its a 8 width SIMD. Is it true? If so, then in a SM there will be 8x8 (total 64) instructions running together in parallel, because there are 8 SPs inside a SM. I have found in another document that blocks are issued to the SMs in the granularity of warps (a set of 32 threads running same instruction). And at an instant of time there will be only one warp issued to the SM for execution. Now its quite ambiguous to me why 32 parallel instructions are running when it should run 64 instructions in parallel.

I think I have some wrong information or less information about the SP pipeline. If somebody can clarify my doubt on this topic I will be highly obliged.

Thanks,
Nil

Quoc_Vinh · September 29, 2009, 3:25am

yes, it is.

assume that your block has 64 threads, and this block will manipulated by 1 Multiprocessor (SM). 1 SM has 8 cores.

This block will split in to two warp (32 threads for 1 warp), so we need 2 warps for all threads of your block.

the first instruction: 8 cores will manipulate first first 8 threads of first warp.

the second instruction: 8 cores will manipulate first second 8 threads of first warp.

the third instruction: 8 cores will manipulate first the third 8 threads of first warp.

the four instruction: 8 cores will manipulate first the four 8 threads of first warp.

after finish first warp, the second warp will be manipulated.

remember that the meanning of haft warp (1/2 warp= 16 threads ) is useful when using shared memory.

Nil · October 1, 2009, 1:24am

Hi Quoc,

Thanks!

Can you provide little detail about this?

According to your reply, the scheduler schedules 1 warp to 1 SM. Then the instruction issue unit issues -

1 instruction for 1st 8 threads of warp 1

1 instruction for 2nd 8 threads of warp 1

1 instruction for 3rd 8 threads of warp 1

1 instruction for 4th 8 threads of warp 1

1 instruction for 1st 8 threads of warp 2

1 instruction for 2nd 8 threads of warp 2

1 instruction for 3rd 8 threads of warp 2

1 instruction for 4th 8 threads of warp 2

2 instruction for 1st 8 threads of warp 1

2 instruction for 2nd 8 threads of warp 1

2 instruction for 3rd 8 threads of warp 1

2 instruction for 4th 8 threads of warp 1

2 instruction for 1st 8 threads of warp 2

2 instruction for 2nd 8 threads of warp 2

2 instruction for 3rd 8 threads of warp 2

2 instruction for 4th 8 threads of warp 2

… continues

This suggests the SP cores are not 8-width SIMD. Instead SISD!

Am I right?

–

Nil

Quoc_Vinh · October 1, 2009, 2:53am

Yes, I think so.

Topic		Replies	Views
questions about sp and sm CUDA Programming and Performance	5	3895	June 19, 2019
how many threads concurrently run at a clock? CUDA Programming and Performance	3	1425	April 15, 2009
About Warps how Warps are allocated to SP/SM CUDA Programming and Performance	2	8285	September 11, 2009
Warps - Number of threads running concurrently CUDA Programming and Performance	4	2153	March 19, 2011
Thread Scheduling Concept CUDA Programming and Performance	3	3627	June 21, 2012
Simple summary of CUDA execution model An attempt to simplify and summarize various sources on execu CUDA Programming and Performance	7	5544	July 28, 2009
threads in one block CUDA Programming and Performance	7	1748	March 6, 2010
1 MP has 8 SP, but warp size is 32! CUDA Programming and Performance	6	3436	January 22, 2009
Threads per warp vs number of cores CUDA Programming and Performance	2	2600	February 3, 2009
How the 16 int cores in a processing block in SM execute when 32 integers in a warp is calculated? CUDA Programming and Performance cuda , board-design	4	969	September 28, 2023

Inquisitive about SP cores in SMs

Related topics