Hello. I am reading Hwu, Kirk and Hajj’s “Programming Massively Parallel Processors: A Hands-on approach”, 4th edition. I am confused about the organization of streaming multiprocessors into processing blocks as described in the chapter on compute architecture and scheduling. The authors give the ex…

[image] oglr: But with 32 threads in a warp, how can the 16 cores in an Ampere A100 SM processing block execute 32 threads at the same time? The warp is split across 2 cycles, 16 threads at a time. The “4 processing blocks with 16 cores each”, is referred to as an SMSP - SM Sub Partition. Alt…

Streaming multriprocessors and processing blocks

Accelerated Computing CUDA CUDA Programming and Performance

rs277 January 7, 2024, 7:51pm 2

The warp is split across 2 cycles, 16 threads at a time. The “4 processing blocks with 16 cores each”, is referred to as an SMSP - SM Sub Partition. Although answering a question about instruction latency, Greg’s answer here may clarify things. His " EXAMPLE 1 : 1 Warp per SM Sub-partition shows the ALU active for two consecutive cycles processing all 32 threads.

2 Likes

Topic		Replies	Views
Newbie confusion: thread, block, multiprocessor and processor CUDA Programming and Performance	2	1225	April 13, 2011
question about warp, block and threads CUDA Programming and Performance	4	2023	February 3, 2009
Warp Size Question CUDA Programming and Performance	21	14103	June 18, 2010
No.of threads per scalar processor CUDA Programming and Performance	6	6524	July 10, 2009
How they work betweem SM and block SM, SP, Block, Thread and so on. CUDA Programming and Performance	1	4332	January 8, 2008
Multiprocessors or Cuda Cores CUDA Programming and Performance	25	19908	July 5, 2011
Whats a WARP for? CUDA Programming and Performance	8	6509	June 21, 2007
Simple summary of CUDA execution model An attempt to simplify and summarize various sources on execu CUDA Programming and Performance	7	5603	July 28, 2009
Blocks/Warps/Threads Allocation I have some doubts about the allocation of blocks/warps/thread in CU CUDA Programming and Performance	5	2613	November 1, 2012
GPU architecture and CUDA kernel execution CUDA Programming and Performance	13	24970	September 6, 2009

Streaming multriprocessors and processing blocks

Related topics