I wonder if thread in syncthreads() consumes GPU resources or not? Is it some kind of conditional sleep or cycle with check for some flag?
Actually the question is broader: if i have one “active” thread in a block (let’s say it’s size = 512 threads) and other 511 are sitting in __syncthreads(). Will SM (let’s say it only processes 512 threads) execute other blocks or it will continue to execute this active thread and just keep its resources busy with 511 inactive threads? (yes, i know about warps but i don’t know how SM chooses block/warp to execute on each iteration)
threads/warps from any threadblock that is launched on a SM consume resources (registers, perhaps shared memory, etc.) on that SM. Warps that are fully (all 32 threds) idle due to waiting at a syncthreads barrier do not consume “execution slots”, and therefore assuming that there are other warps that are ready-to-execute, the SM can still schedule useful warps to do useful work.
These threads/warps will still continue to “count” towards the maximum number of threads per SM.
Sorry. Something does not compute here. Warps do not consume “execution slots” (so SM can execute other warps) but take place in SM (“count” towards the maximum number of threads per SM)?
What txbob means is that warps waiting at a synchronization barrier still use registers and shared memory (if your kernel does use any) but they will not be issued any instructions.
The SM schedulers can only issue a certain number of instructions per clock cycle and have limited register resources, these warp will still take up space on the SM but the schedulers will not bother with them and therefore they will not consume computation resources (ALUs, SFUs etc).
Late thanks. It is clear for me now.