A question about warps and threadblock

AndyWu · April 13, 2013, 10:46pm

I learned from a presentation slides saying: “Prefer to have enough threads per block to provide hardware with many warps to switch between - this the the way how the GPU hides memory access latency.”

Also I learned: “all the threads in the same thread block are supposed to execute concurrently.” If that is the case, all the threads in a threadblock are supposed to execute concurrently, then when does the switch happen? Also how is the memory access latency hiding achieved?

I am confused:) Thanks for help in advance!

-AW

vvolkov · April 14, 2013, 12:36am

Switch happens every cycle. You can’t switch, however, to a warp that is stalled - so, having more concurrent warps helps.

Latency hiding is achieved by doing other work when waiting for the data to come. Same applies to hiding arithmetic latency.

AndyWu · April 14, 2013, 2:26am

Thanks vvolkov!

So you mean: all the threads in a threadblock are NOT running concurrently ON THE HARDWARE IN REAL. Every cycle a warp will pick up a group threads (32 threads) from the threadblock to execute?

vvolkov · April 14, 2013, 6:16am

What do you mean “in real”? The active threads are all current, e.g. if you have 64 active warps on a multiprocessor (=2048 threads), the hardware can issue an instruction from any of them, without any context switching. (So, the “switch” cited above doesn’t really happen. Nothing is switched, i.e. moved or replaced - only selected.) This is just like hyperthreads on CPU, but at a larger scale. On CPU you have 2 hyperthreads per core, here you have 64 warps per multiprocessor.

Note that executing an instruction takes time. You keep sending a new instruction into the execution pipeline every cycle, but it takes many cycles until the result comes back. In result, you get tons and tons of instructions in progress - many more than the number of instructions issued or completed every cycle. (Which is bound by 8 per multiprocessor on Kepler.)

AndyWu · April 14, 2013, 3:02pm

Thanks again vvolkov. The execution of a threadblock is preemptive, so memory latency hiding is achieved through scheduling different threadblocks:)

vvolkov · April 14, 2013, 5:57pm

Sort of. It is more about scheduling different warps, not threadblocks though.

Topic		Replies	Views
Warp switching does anybody understands the mechanism CUDA Programming and Performance	16	8580	March 28, 2008
Block context switch penalty? CUDA Programming and Performance	2	2698	October 24, 2009
Warp Size Question CUDA Programming and Performance	21	14103	June 18, 2010
Forcing a CUDA thread block to yield CUDA Programming and Performance	3	2250	January 5, 2012
Whats a WARP for? CUDA Programming and Performance	8	6509	June 21, 2007
Can different warps in a block do different things? CUDA Programming and Performance	8	1716	April 29, 2011
Each thread working concurrently ? CUDA Programming and Performance	5	1156	March 2, 2010
Basic question about hiding latency CUDA Programming and Performance	6	2154	July 9, 2014
Parallel Access to GDU Global Memory CUDA Programming and Performance	9	8967	January 24, 2008
Thread and Instruction Scheduling CUDA Programming and Performance	3	3343	August 17, 2007

A question about warps and threadblock

Related topics