Question about the number of SMs using in the program.

xiaodongyee · April 9, 2018, 3:17pm

Dear all,
What’s the strategy of GPU scheduler to allocate the SM to my program? Can I use some APIs to specific the number of SMs should be used in my program?

Robert_Crovella · April 9, 2018, 3:58pm

Other than via CUDA stream priorities, you have no control over the block scheduler in a GPU.

The heuristics of block scheduling are not published.

The GPU block scheduler will generally attempt to deliver blocks to SMs in such a way as to maximize throughput of your kernel. This generally means delivering blocks evenly to all available SMs.

You should strive for full occupancy of the GPU. As a target minimum, this means create kernels that contain at least 2048*(# of SMs in your GPU), total thread count (or more).

xiaodongyee · April 9, 2018, 4:07pm

Dear txbob,
Thanks for your quick reply.
By the way, could you please explain why is 2048*# of SMs? Does this mean the number of threads per block is 2048?

Thanks again for your help.

BulatZiganshin · April 9, 2018, 5:16pm

OpenCL allows to divide GPU into sub-regions. But if your goal is to fill as much GPU as possible, it’s hardly of interest for you

One SM can execute up to 2048 threads. Additionally taking into account tail effect, optimal amount of threads is 20K or more per SM

Topic		Replies	Views
How to specific the number of SMs used in my program? CUDA Programming and Performance	1	807	April 9, 2018
Number of blocks parameter for kernel when GPU has just one SM CUDA Programming and Performance	3	515	August 4, 2017
Scheduling blocks to SMs at runtime CUDA Programming and Performance	7	2811	October 27, 2008
Ensuring blocks per SM CUDA Programming and Performance	4	1085	February 20, 2012
Assign blocks to SMs CUDA Programming and Performance	5	1590	February 4, 2019
Relation between SM and block CUDA Programming and Performance	1	5594	March 18, 2010
hardware scheduling logic on the GPU CUDA Programming and Performance	2	730	December 7, 2012
block numbers related to the number of SMs blocks in multiple SMs CUDA Programming and Performance	1	1412	December 1, 2009
understand the mapping of the block threads to SMs in GPU CUDA Programming and Performance	3	2728	August 2, 2018
Cuda Cores Cuda Cores - run threads bloocks, kernels etc. CUDA Programming and Performance	5	1760	February 22, 2011

Question about the number of SMs using in the program.

Related topics