Resident warp vs active warp

lynch51 · January 30, 2014, 12:17pm

Hi,

I’m currently try to understand the life cycle of Threads, Warps and Block.

A warp can be active or inactive. But what is a resident warp?

Is an active block can have inactive warp or thread?

I’m a little bit confuse about the this two words (active and resident)

Is someone can help me?

Many Thanks

PS: English is not my mother tongue…

Greg · January 31, 2014, 5:34am

The terms usually used by the profiler are:

active_warp - A warp is active if it has been allocated to an SM and all warp level resources (registers) have been allocated.

eligible_warp - An active warp is eligible if it can issue an instruction.

stalled_warp - An active warp is stalled if it is not able to issue an instruction due to a resource or data dependency.

I’m not surely what literature uses the term “resident”. A resident warp would be the same as an active warp.

Threads can be in different states.

Active Thread - A thread is active if its active bit is set in the warp active_mask.

Inactive Thread - A thread is inactive if its active bit is not set in the active_mask. This can happen if the warp take a divergent control path.

Exited Thread - A thread is inactive and exited if the thread has executed a EXIT instruction. Exited threads cannot become active again.

Skybuck · January 31, 2014, 9:42am

In short:

Resident threads is the ammount/number of threads the GPU can load into it’s chip’s memory.

Longer answer:

The GPU has a limited ammount of cores available. It cannot execute millions of threads at the same time. The GPU can only execute as many threads at the same time as it has cores available.

However sometimes some of these threads may stall for different reasons. Therefore the GPU uses a little trick. It has some additional memory which is used to store/load additional threads onto the GPU. These threads are not yet executed but they are initialized I suppose so that they can be executed at a moments notice.

This is what is referred to as resident threads… think of these as “on chip threads”. Like a cpu may store thread contexts on the stack in some cache somewhere I suppose.

So the GPU does not have to load threads from main memory or something… but it can quickly switch to these resident threads and execute those… a sort of hardware thread context switching.

It can then later return to stall threads and execute those if those are unblocked.

If all resident threads stall and get blocked the GPU will ultimately dead-lock.

So think of GPU as a batch based processor. The kernel’s threads must all exit if the GPU is not to dead-lock. No gpu thread must wait on the results of another thread or it may dead lock.

For example thread 1 to 10000 must not wait on the result of thread 1000000.

Because this would consume the GPU with threads 1 to 10000 or whatever it’s maximum resident threads is… and then it will never execute thread 1000000.
Thus threads 1 to 10000 will be waiting forever ;)

lynch51 · February 4, 2014, 7:50am

Thank you guys!

It’s more clear now.

For the resident warp and block: I found this terms in the Cuda Official Documentation: [url]http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#features-and-technical-specifications[/url]
For me a resident thread is a thread that have been allocated. Theses threads can be active or inactive. A resident warp is the same as an active warp and a resident block is the same has an active block.

Te1 · January 12, 2017, 2:26pm

I wonder how many blocks/warps is neended to hide the latency of memory accesse? Dose is it a function of the maxmim of resident blocks/warps?

Greg · January 20, 2017, 3:22pm

If you are using the Nsight VSE profiler you can determine if you have sufficient warps to hide latency by looking at the Issue Efficiency experiment. See http://docs.nvidia.com/gameworks/index.html#developertools/desktop/nsight/analysis/report/cudaexperiments/kernellevel/issueefficiency.htm.

If the kernel is fully hiding latency then every warp scheduler should be able to pick and eligible warp each cycle and issue 1 or 2 instructions. The Warp Issue Efficiency chart shows the percentage of active cycles that the warp scheduler was not able to issue an instruction. If “No Eligible” is high there are too options.

Increase occupancy. The Warps Per SM chart show the theoretical occupancy and achieved occupancy in warps (vs. percentage). If the theoretical value is low (32) then you can go to the Achieved Occupancy experiment and determine the tradeoffs of increasing occupancy. If the theoretical occupancy is high but the achieved occupancy is low then either the launch dimensions (GridDim) does not fill the machine, there is a tail effect in blocks (portion of warps exit early in each block), or there is a tail effect in blocks. If the theory is high and the achieved is high then you have to resolve stalls. See the Issue Stall Reason Chart.
Resolve stalls. The Issue Stall Reasons chart shows the percentage of time active warps were stalled. Removing the primary reason will improve the number of eligible warps.

Topic		Replies	Views
How to understand "active thread block"? CUDA Programming and Performance	4	546	August 4, 2023
What is the difference between SP and CUDA core? CUDA Programming and Performance	7	7723	October 12, 2021
question about warp, block and threads CUDA Programming and Performance	4	2002	February 3, 2009
How to keep the float pipe busy? CUDA Programming and Performance	7	708	April 23, 2019
Warp switching does anybody understands the mechanism CUDA Programming and Performance	16	8511	March 28, 2008
Question about NVIDIA Visual Profiler's occupancy results CUDA Programming and Performance	2	980	May 29, 2019
Forcing a CUDA thread block to yield CUDA Programming and Performance	3	2187	January 5, 2012
Blocks/Warps/Threads Allocation I have some doubts about the allocation of blocks/warps/thread in CU CUDA Programming and Performance	5	2577	November 1, 2012
about occupancy CUDA Programming and Performance	3	1646	December 16, 2009
warp and core What's the relationship between warp and core? CUDA Programming and Performance	12	15593	February 4, 2011

Resident warp vs active warp

Related topics