Overhead for SM to switch to other set of blocks ?

skyblues · August 27, 2008, 2:44pm

I’ve read the manual that runtime will try to schedule as many blocks of threads as possible to one SM (While satisfying the limits of all shared mem, registers, max 7** threads per SM,…). This is also called occupancy , right ?

What is the overhead in terms of cycles for that SM to execute other sets of blocks after it finishes dealing with that previous sets of blocks ? What triggers that SM to switch to other sets of blocks ?

Thanks

E.D_Riedijk · August 27, 2008, 3:46pm

What I understand is that as soon as one of the blocks is finished, another block is brought in. I don’t know about overhead, there is a post with a small benchmark somewhere that talks about the overhead of having blocks that do no work at all (exit immediately) I believe it was a post by Sarnath or MisterAnderson42.

MisterAnderson42 · August 27, 2008, 8:36pm

The overhead is essentially zero.

skyblues · August 27, 2008, 9:15pm

If there are total of 500 blocks to be executed, and let’s say that 8 blocks could fit into one SM.

For 8800GTX , since there are 16 SMs, 8 * 16 = 128 blocks will be executed by 16 SMs. I understand that there is no overhead for particular SM to switch to different block (Within those 8 blocks).

My question is, when particular SM finishes dealing with those initially assigned 8 blocks… what will be the overhead to switch to different set of blocks from remaining 372 blocks… ?

Is it still zero-overhead ?

Thanks

MisterAnderson42 · August 28, 2008, 2:36am

Yes. If you think about it, the only initialization that really needs to be done is the thread/blockIdx values which takes essentially no time, especially if there is special register initialization hardware for this purpose.

Topic		Replies	Views
Overhead of block scheduling? CUDA Programming and Performance	0	1133	May 13, 2009
SM has to finish one block before executing another? CUDA Programming and Performance	9	5315	October 27, 2010
Scheduling blocks to SMs at runtime CUDA Programming and Performance	7	2832	October 27, 2008
More blocks than SMs may not make sense CUDA Programming and Performance	13	2740	November 11, 2010
What resources are needed for a block to run? CUDA Programming and Performance	9	3173	May 21, 2009
Amount of Shared Memory CUDA Programming and Performance	10	4264	June 3, 2010
What will be happen in the situation CUDA Programming and Performance	9	6265	December 23, 2008
how does 2 blocks executed by GPU? CUDA Programming and Performance	3	7968	January 3, 2009
How blocks will be distributed among SPs ? CUDA Programming and Performance	4	1562	October 13, 2008
What's the cost of loading in blocks? CUDA Programming and Performance	3	2338	April 9, 2008

Overhead for SM to switch to other set of blocks ?

Related topics