Execution of warps

rocksportrocker · January 7, 2009, 2:11pm

Hi,

I try to understand the GPU’s architecture.
How are warps executed on a multiprocessor ? As a warp has a width of 32 threads,
and auch SM has 8 SPs, I do not understand how a warp can be executed simultaneously
(as long as no divergence happens).

Greetings, Uwe

E.D_Riedijk · January 7, 2009, 2:40pm

It is quite simple, it takes 4 cycles to ‘execute’ a warp (4*8 = 32) :)

I should say it takes 4 cycles to start an instruction for a warp, because of the pipeline depth it takes like 20 cycles. That is why it is recommended to have at least 6 warps on a multiprocessor to hide the pipeline latency.

So a warp is not running simultaneously in reality, but from a programming point of view it is (you will not see updates written by 1 thread of a warp within another thread of that warp during the same instruction)

Topic		Replies	Views
Warp execution CUDA Programming and Performance	2	4473	September 10, 2007
About Warps how Warps are allocated to SP/SM CUDA Programming and Performance	2	8327	September 11, 2009
CUDA execution mapping onto GPUs CUDA Programming and Performance	0	2818	March 2, 2009
Parallel thread processing in a warp CUDA Programming and Performance	5	3703	July 17, 2009
Warps - Number of threads running concurrently CUDA Programming and Performance	4	2173	March 19, 2011
What is a warp? CUDA Programming and Performance	1	2446	March 3, 2008
How more exactly a thread is executed on GPU CUDA Programming and Performance	9	3005	March 7, 2017
Threads per warp vs number of cores CUDA Programming and Performance	2	2602	February 3, 2009
Number of threads physically executing in parallel per core? Whats the physical level of parallelism CUDA Programming and Performance	5	12314	November 8, 2010
How is a warp executed on a SM CUDA Programming and Performance hw , cuda	0	313	September 7, 2020

Execution of warps

Related topics