CUDA WARPS Conceptual question regarding warps

kartik14 · May 30, 2008, 1:41am

Hi,

I have a question regarding warps in CUDA.

My understanding is that, the GPU device has many multiprocessors (say N) and each multiprocessor has several processors (say M).

So, if we load a kernal on to the device, it is executed as a grid of thread blocks. These blocks are sheduled for execution on the multiprocessors.

The active blocks are then split into groups of threads called warps which are executed simultaneously.

Since the maximum number of processors on a single multiprocessor is M, shouldn’t the warp size always be M?

Can anyone please clarify this?
Thanks…
:blink:

vvolkov · May 30, 2008, 3:03am

No, it should be a multiple of M. Think of it as M-lane vector processor with vector length equal to warp size.

kartik14 · May 30, 2008, 3:22am

I was just browsing the CUDA Programming guide (Appendix A)

and it was metioned that the warp size is 32 threads…

So, if the warp size is ‘fixed’ at 32 threads, does it mean that the number of processors in each multiprocessor (M) is a factor of 32…??

Does this mean that the value of M is restricted ?

Thanks…

seibert · May 30, 2008, 3:53am

Yes, the number of multiprocessors is 8 in all current CUDA cards. The warp size is also fixed for all current cards at 32, although you query the warp size at runtime. Future devices may have different values.

FullyArticulate · May 30, 2008, 6:44am

8 cores per multiprocessor, you mean. One of my cards has 12 multiprocessors, another 16.

JHHPC · May 30, 2008, 7:29am

Said before but just as starter:

Each GPU has multiprocessors ranging to 16 at the top models. Each mp now has 8 SIMD processors.

The warps size of 32 is easy explained if you go into detail of the instruction set.

The most common instructions take 4 clock cylces. As each warp can issue 8 threads at a time (remember the 8 SIMD processors), it takes 4 steps to issue all threads. At this point the pipeline is finished for the first 8 threads.

You can ensure scaling by splitting up your problem to as many blocks as possible. As long as you do this and keep at least 64 threads per block (== 2 warps), your algorithm should be independent of the amount of multiprocessors

In my opinion the number of SIMD procs per MP will not increase too much, as SIMD constraints then become a problem for scientific computing.

Johannes

seibert · May 30, 2008, 12:02pm

Ah, nuts. Yeah, I mean 8 processors per multiprocessor. Sorry to add to the confusion.

Topic		Replies	Views
Clarification on concept to hardware mapping CUDA Programming and Performance	2	2251	January 11, 2008
question about warp, block and threads CUDA Programming and Performance	4	2006	February 3, 2009
Warps - Number of threads running concurrently CUDA Programming and Performance	4	2180	March 19, 2011
A question the parallelization CUDA Programming and Performance	5	2696	July 29, 2008
Maximum Number of Warps and Warp Size per SM CUDA Programming and Performance cuda , gpu , architecture-and-design	5	7641	November 30, 2022
Basic question about warps CUDA Programming and Performance	14	6617	June 9, 2009
How is WARP SIZE determined? CUDA Programming and Performance	3	3233	July 16, 2010
Why bundle threads (into warps)? CUDA Programming and Performance	3	1378	August 18, 2009
help me understand cuda CUDA Programming and Performance	4	6890	February 10, 2010
Question about threads per block and warps per SM CUDA Programming and Performance	13	16471	October 6, 2022

CUDA WARPS Conceptual question regarding warps

Related topics