how many threads concurrently run at a clock?

halbert · April 14, 2009, 3:21pm

I’m a newbie for CUDA and have a little confusion about SM, SP, warp and threads.

From the lecture or guide of CUDA, a SM has 8 SP, a SP is corresponding to a single thread. So, a single instruction should intuitively be executed by 8 SPs, that is, SM should have 8 threads at a single clock. But the lecture also told us that a warp has 32 threads.

From my opinion, a instruction need 4 clocks to be finished, and a SM has 8 SPs. So, a warp has 4*8 = 32 threads for a single instruction. At a single clock, there’s 8 threads running, and the other 24 threads are buffered. 8 SPs are continually executing 1/4 instruction at other every clock.

That’s the relationship of the SP and the thread and the warp.

| 8 thread corresponding to 8 SPs | | a clock tick |

Is my understanding correct?

Let’s go further, the lecture told us there’re up to 768 threads can be executed by a SM. Why’s that?

Is there a buffer something with limited resource to be executed to 768 threads? Or something else?

Thanks.

Halbert.XIE

MisterAnderson42 · April 14, 2009, 8:36pm

Yes.

Probably some limitation of the scheduling hardware or something. Note that compute 1.3 devices can actually handle 1024 threads: 768 is the limit for older hardware.

halbert · April 15, 2009, 9:21am

Thanks. Can you offer me some references which can explain this thing clearly?

MisterAnderson42 · April 15, 2009, 12:41pm

Sure. This is copied and pasted from the FAQ.

Where can I find more information on NVIDIA GPU architecture?

J. Nickolls et al. “Scalable Programming with CUDA” ACM Queue, vol. 6 no. 2 Mar./Apr. 2008 pp 40-53
```
<a target='_blank' rel='noopener noreferrer' href='"http://www.acmqueue.org/modules.php?name=C...age&pid=532"'>http://www.acmqueue.org/modules.php?name=C...age&pid=532</a>
```
- E. Lindholm et al. “NVIDIA Tesla: A Unified Graphics and Computing Architecture,” IEEE Micro, vol. 28 no. 2, Mar.Apr. 2008, pp 39-55

Topic		Replies	Views
768 threads vs warp CUDA Programming and Performance	2	1458	August 16, 2009
Warps - Number of threads running concurrently CUDA Programming and Performance	4	2167	March 19, 2011
1 MP has 8 SP, but warp size is 32! CUDA Programming and Performance	6	3440	January 22, 2009
Inquisitive about SP cores in SMs CUDA Programming and Performance	3	1406	October 1, 2009
help me understand cuda CUDA Programming and Performance	4	6876	February 10, 2010
Thread Scheduling Concept CUDA Programming and Performance	3	3692	June 21, 2012
questions about sp and sm CUDA Programming and Performance	5	3987	June 19, 2019
About Warps how Warps are allocated to SP/SM CUDA Programming and Performance	2	8311	September 11, 2009
Threads per warp vs number of cores CUDA Programming and Performance	2	2602	February 3, 2009
CUDA execution mapping onto GPUs CUDA Programming and Performance	0	2818	March 2, 2009

how many threads concurrently run at a clock?

Related topics