What is the allocation stall means in nsight compute warp state statistics?

Hi,

I’m reading the Kernel Profiling Guide, and I feel it’s not easy to understand the Allocation Stall:

Warp was stalled waiting for a branch to resolve, waiting for all memory operations to retire, or waiting to be allocated to the micro-scheduler.

Could anyone help me to explain the three reasons of allocaton stall?

  • waiting a branch to resolve
  • waiting for all memory operations to retire
  • waiting to be allocated to the micro-scheduler

What means “a branch to resolve” and “allocated to micro-scheduler”?

Thanks

In general, the allocation stalls occur because the warp scheduler cannot allocate the warp to run for some reason. There are numerous reasons this can occur, but I will try to give some basic details here:

  • Waiting for branch to resolve - some information from a previous branch, like branch target, next instruction, etc… can only be determined after a branch is fully evaluated. The warp scheduler needs more time to complete this. Until then, the warp cannot continue (is stalled).
  • Waiting for all memory operations to retire - this usually occurs at the end of a kernel where memory operations are still in flight, so the kernel cannot complete, but there are no more instructions to execute. So the warp is stalled until they complete.
  • Waiting to be allocated to the micro-scheduler - some resources needed for the warp (register file, shared memory, etc…) are not yet ready and the warp is stalled waiting for them.