1 MP has 8 SP, but warp size is 32!

iara · January 22, 2009, 5:27am

I knew that 1 SM has 8 SPs, and 1 SP processes 1 thread.
But warp size is 32 not 8.
How can 32 threads process in parallel?

SPWorley · January 22, 2009, 6:01am

Each SP is pipelined. Most math ops use one clock, but with 4 clocks of latency.

The warp scheduler issues new instructions every 4 clocks as well.

So the 32 threads effectively are done in 4 passes of one clock each.

But most of this is abstracted from you, in your head you can just think of it as all 32 warp threads stepping forward simultaneously.

iara · January 22, 2009, 7:52am

Thank you for your answer. :rolleyes:

And How can I get these informations not from this forum? Is there a white paper about the architecture like warp scheduler?

E.D_Riedijk · January 22, 2009, 11:27am

All the info needed to program is in the programming guide.

If you want deeper knowledge about the hardware and how the software maps on the hardware (and again, this is not needed for programming, it is all nicely abstracted away), papers like “scalable parallel programming with cuda” are nice to read.

MisterAnderson42 · January 22, 2009, 12:32pm

Specifically, iara, there are references listed in the FAQ: did you read it?

http://forums.nvidia.com/index.php?showtopic=84440

gshi · January 22, 2009, 8:04pm

Doesn’t it mean that at the end of 4 cycles, the 4 groups of 8 threads

are not realy done yet (in 1st, 2nd, 3rd and 4th pipeline stages, respectively)?

E.D_Riedijk · January 22, 2009, 8:18pm

Yes, the pipeline is even deeper. 6 warps or 192 threads are necessary per multiprocessor to hide the complete pipeline depth. It is a shame the search function on the forum isn’t the best, but you can find relevant threads via google, just ask google to look for the following: 6 warps site:http://forums.nvidia.com

Topic		Replies	Views
Warps - Number of threads running concurrently CUDA Programming and Performance	4	2167	March 19, 2011
how many threads concurrently run at a clock? CUDA Programming and Performance	3	1425	April 15, 2009
About Warps how Warps are allocated to SP/SM CUDA Programming and Performance	2	8311	September 11, 2009
questions about sp and sm CUDA Programming and Performance	5	3988	June 19, 2019
Basic question about warps CUDA Programming and Performance	14	6577	June 9, 2009
768 threads vs warp CUDA Programming and Performance	2	1458	August 16, 2009
Thread Scheduling Concept CUDA Programming and Performance	3	3694	June 21, 2012
GPU architecture and CUDA kernel execution CUDA Programming and Performance	13	24841	September 6, 2009
Wrap size depending on the number of SP/SM CUDA Programming and Performance	1	11460	March 10, 2011
CUDA execution mapping onto GPUs CUDA Programming and Performance	0	2818	March 2, 2009

1 MP has 8 SP, but warp size is 32!

Related topics