Ensuring blocks per SM

jayshenoy · February 19, 2012, 6:17am

Hello,

I’m trying to launch multiple blocks to a single SM.

Device=GTX 470 CC=2.0

As per CUDA Occupancy Calculator I can launch 6 blocks per SM (Reg/thread=5, Shared Memory=22B, Threads/Block=256). But when I launch 6 blocks with 256 threads/block my GFLOPs will be 355. For GTX 470 the peak GFLOP for MAD per SM is ~70GFLOPs, which means other SMs are involved while execution.

How do I control the distribution of blocks to SMs, and is my speculation right?

seibert · February 19, 2012, 1:34pm

If the scheduler was putting all 6 blocks on the same SM, it would not be doing its job properly. (Under normal circumstances, you want blocks distributed over all the SMs. Filling up one SM before moving to the next would tend to underutilize the device.) CUDA does not provide any simple interface to control block scheduling.

It might be possible to force the configuration you want by launching a lot of blocks (enough to fill the entire device), then having each thread check the %smid register using some inline PTX. Then the thread can decide to exit If the %smid register does not equal the ID number of the target SM. There is the possibility of a race condition between your threads and the block scheduler, so this still might not work.

jayshenoy · February 20, 2012, 1:55pm

Yeah, you are right. Where can I learn more about the block scheduler?. I didn’t find that in the guide. Thanks a lot!

DrAnderson42 · February 20, 2012, 3:27pm

The block scheduler is not officially documented. You can search the forums and find various posts where microbenhcmarks are used to guess how the scheduler may work.

seibert · February 20, 2012, 4:09pm

The documentation on the block scheduler seems to basically be: “Launch a lot of blocks. Trust us on the rest.”

Topic		Replies	Views
Assign blocks to SMs CUDA Programming and Performance	5	1646	February 4, 2019
Scheduling blocks to SMs at runtime CUDA Programming and Performance	7	2832	October 27, 2008
How blocks will be distributed among SPs ? CUDA Programming and Performance	4	1562	October 13, 2008
Weird SM scheduling policy on GTX570 CUDA Programming and Performance	2	559	March 19, 2015
How to specific the number of SMs used in my program? CUDA Programming and Performance	1	817	April 9, 2018
Request clarification on CUDA runtime scheduling CUDA Programming and Performance	1	1759	September 5, 2008
Question about the number of SMs using in the program. CUDA Programming and Performance	3	818	April 9, 2018
How do the thread blocks resides in the multiprocessors? CUDA Programming and Performance	4	2045	April 16, 2012
hardware scheduling logic on the GPU CUDA Programming and Performance	2	741	December 7, 2012
Mapping of Thread Blocks to SMs CUDA Programming and Performance	1	1026	January 18, 2015

Ensuring blocks per SM

Related topics