understand the mapping of the block threads to SMs in GPU

aks_pid · August 2, 2018, 2:42am

I want to understand in CUDA program that if i create a block<<1,N>> , where N is width*width will create a block of N threads, this will have 1 block of N threads.

Does this guarantee that this one block is mapped and executed on only one SM or can be across multiple SMs on the same GPU??
Does this execution of threads is serialised or concurrent on CUDA kernels??
I am working on the tesla k80.

Robert_Crovella · August 2, 2018, 5:08am

A block in CUDA always executes on a single SM
The threads in a block can execute concurrently

aks_pid · August 2, 2018, 9:34am

Thanks a lot for the reply.
Going by this logic if i have availability of 10 SMs and i created 4 threaded blocks,

will they be allotted 4 out of 10 SMs?? is it fixed 4 or not??
Suppose i want to have 32 threaded blocks running on 10 SMs , how they will be scheduled ??

cbuchner1 · August 2, 2018, 1:17pm

will they be allotted 4 out of 10 SMs?? is it fixed 4 or not??

You will likely see these 4 thread blocks running on 4 SMs. But don’t make any assumptions about which SMs the block scheduler picks.

Suppose i want to have 32 threaded blocks running on 10 SMs , how they will be scheduled ??

a) this is not documented, and may vary depending on GPU architecture (generation)

b) it also depends a lot on the the achievable occupancy of your kernel. It may be that one SM is capable of executing 2, 3, 4 or more blocks simultaneously - given the constraints of the registers per thread, shared memory requested and number of texture units used by each kernel allow for it. In this case you may see that all 32 thread blocks execute concurrently.

Topic		Replies	Views
how are blocks scheduled for execution? CUDA Programming and Performance	3	3463	December 9, 2016
Number of blocks parameter for kernel when GPU has just one SM CUDA Programming and Performance	3	524	August 4, 2017
Relation between SM and block CUDA Programming and Performance	1	5602	March 18, 2010
Scheduling blocks to SMs at runtime CUDA Programming and Performance	7	2824	October 27, 2008
Question about the number of SMs using in the program. CUDA Programming and Performance	3	812	April 9, 2018
Concurrent execution of kernels on the same SM CUDA Programming and Performance	1	554	October 28, 2021
Scheduling Thread Blocks CUDA Programming and Performance	5	1267	July 29, 2021
Which entity will execute one block? A single Cuda core or a SM? CUDA Programming and Performance	13	17141	December 7, 2010
What will be happen in the situation CUDA Programming and Performance	9	6264	December 23, 2008
How do the thread blocks resides in the multiprocessors? CUDA Programming and Performance	4	2037	April 16, 2012

understand the mapping of the block threads to SMs in GPU

Related topics