Question about warp reuse.

nonsense · August 29, 2009, 5:39pm

Hi,

I’ve read through the programming guide and the best practices guide, but didn’t manage to discover the exact rules about warp reuse. It’s my understanding that when a warp is waiting on a sync, other threads can receive processing in that space during the wait. My question is this: which threads are candidates? Is it only other threads within the same block, or can threads from a separate block be processed in that space?

TIA for any help.

nonsense · September 5, 2009, 3:59am

I hate to double-post, but it seems like this should be an easy question to me. Should I have posted in the programming forum? I did some more research, but nothing is perfectly clear on the subject.

seibert · September 5, 2009, 4:56am

The scheduler on the multiprocessor will time slice between active warps, regardless of which blocks they come from. This is why you can hide memory latency either by having large blocks, or by having multiple smaller blocks active on the same multiprocessor.

However, once a block starts, it must run to completion, so the scheduler cannot swap other blocks onto the multiprocessor to cover for idle warps. The occupancy calculator spreadsheet can help you figure out how many simultaneous blocks can fit onto a multiprocessor for your kernel and its resource requirements.

(Also, I think your question is appropriate for this forum. You probably didn’t get an answer because the post volume has grown quite a bit here, and many readers can’t read every post anymore. Sometimes you just get unlucky. :) )

nonsense · September 5, 2009, 8:44am

Thanks for the reply. Can I take this to mean that a whole block must be activated at once – i.e. no starting just a few warps at a time as space becomes available? (In my design some warps would ‘terminate’ much sooner than others, but I suppose they still have to sync up with the rest of the block at the end. Is this still true even if they do no output? I assume so.) From what you tell me I think my interpretations are right, but I just need to make sure because otherwise I think I may be able to get much better performance.

Thanks again.

seibert · September 5, 2009, 2:50pm

Yes, blocks are scheduled to run on multiprocessors in their entirety. Even if threads or warps terminate early, another block is not scheduled until the entire block terminates.

Topic		Replies	Views
time slicing CUDA Programming and Performance	6	5170	February 9, 2009
Can threads in a warp from different blocks? CUDA Programming and Performance	17	11895	March 26, 2010
Whats a WARP for? CUDA Programming and Performance	8	6493	June 21, 2007
Thread and Instruction Scheduling CUDA Programming and Performance	3	3338	August 17, 2007
performance gain by "killing" warps can there be any? CUDA Programming and Performance	5	2279	February 12, 2009
Relationship between Warp, MP, Block, Shared Memory CUDA Programming and Performance	1	3450	March 29, 2010
Warp switching does anybody understands the mechanism CUDA Programming and Performance	16	8568	March 28, 2008
A question about the CUDA's thread parallelization CUDA Programming and Performance	12	63043	January 25, 2009
question about warp, block and threads CUDA Programming and Performance	4	2016	February 3, 2009
Warp Size Question CUDA Programming and Performance	21	14086	June 18, 2010

Question about warp reuse.

Related topics