Warps and Occupancy


I have always thought that the warp scheduler will execute one warp at a time, depending on which warp is ready, and this warp can be from any one of the thread blocks in the multiprocessor. However, in one of the Nvidia webminar slides, it is stated that “Occupancy = Number of warps running concurrently on a multiprocessor divided by maximum number of warps that can run concurrently”. So more than one warp can run at one time? How does this work?

Thank you.

It depends on how you define running concurrently. So far, compute capability N.x devices issue instructions for N warps in parallel, which then remain in the pipeline for about 16…24 cycles. The Nvidia slide you are referring to however obviously defines the running warps as all active warps on an SM, regardless of whether they issue an instruction in a particular cycle.

I thought only devices of compute capability 2.1 has dual warp scheduler, while devices of compute capability 2.0 and below only have single warp schedulers?

So if we define “running” concurrently as issuing instructions and not simply waiting in the pipeline, then only 1 warp runs at one time for compute capability 2.0?

Compute 2.0 are dual-issue designs. Instructions from two warps are dual-issued (16 cores per warp), and retired over two clock cycles. Compute 2.1 takes this further by taking a different instruction from one of those warps and issuing it on the third bank of 16 cores, also retired over two clock cycles. So compute 2.1 has something close to out-of-order execution, on top of the basic dual issue design of 2.0 cards.

Note that also 2.0 and even 1.x GPUs can issue a second instruction from the same warp in parallel. 1.x GPUs used this to sometimes issue a mul to the special function unit. On 2.0 devices this capability seems somehow underused, as only 2.1 devices added the third set of cores to reenable more than one arithmetic operation per cycle per thread. Still there should be some instructions (moves?) that can be dual-issued on all GPUs, though I haven’t tried to identify them.