Scheduling Thread Blocks

ho126jin · July 27, 2021, 9:16am

I’m using Quadro RTX 6000 (Turing).

In the picture above, the maximum number of concurrent warps per SM is 32 means maximum threads per SM is 1024 right?

And

If the maximum number of thread blocks per SM is 16 , sum of 16 blocks’s threads on one SM should be 1024 or just need to have less than 1,024 threads on the only block currently running?

Robert_Crovella · July 27, 2021, 2:33pm

Yes, and this information is available to you already in the programming guide, in table 15.

It means that you can have a maximum of 16 threadblocks currently executing in an SM. It does not abrogate other hardware limits. You are also limited (on that device) to a maximum of 1024 threads. Therefore, if you actually wanted to witness 16 threadblocks deposited on a single SM, those threadblocks could have no more than 64 threads. If you launched a kernel with 128 threads per block, you could witness at most 8 blocks on a single SM, even though the hardware limit for that value is 16.

This is the nature of occupancy. All relevant hardware limits must be satisfied.

ho126jin · July 28, 2021, 8:56am

Can blocks assigned by one SM be from different kernels?

Or only have blocks of the same kernel?

Robert_Crovella · July 28, 2021, 10:39am

They can be from different kernels.

ho126jin · July 29, 2021, 6:19am

Then, the blocks execute in warp units.

Can warp of different blocks (different kernel) run on one SM at the same time?

Robert_Crovella · July 29, 2021, 2:08pm

Yes. This is all assuming those kernels are issued from the same application/process, or that MPS is being used.

Topic		Replies	Views
confusion of basic concepts CUDA Programming and Performance	8	6305	May 18, 2011
Max threads/blocks CUDA Programming and Performance	10	80	September 6, 2024
Why is max threads per sm larger than max threads per block? CUDA Programming and Performance	3	1049	January 5, 2024
Relationship between Warp and Thread Block on SM CUDA Programming and Performance cuda	2	508	November 10, 2023
Cuda Cores Cuda Cores - run threads bloocks, kernels etc. CUDA Programming and Performance	5	1739	February 22, 2011
finding the best number of threads per block CUDA Programming and Performance	3	7846	January 29, 2010
How to decide the optimal block size in CUDA CUDA Programming and Performance	4	27649	February 15, 2010
Number of blocks parameter for kernel when GPU has just one SM CUDA Programming and Performance	3	510	August 4, 2017
help me understand cuda CUDA Programming and Performance	4	6876	February 10, 2010
Partitioning CUDA Programming and Performance	0	1997	October 6, 2011

Scheduling Thread Blocks

Related topics