Question about CTA/warp lifecycle

Is my understanding correct that for both CTA and warp, once they become active/resident in the corresponding SM (for CTA) or SMSP (for warp), they will never become “inactive” until the SM/warp completes?

In particular,

CTA-to-SM

  • When a thread block (CTA) is launched, it is permanently assigned to one specific SM.

  • An assigned CTA will keep occupying one of the Max CTA per SM quota (e.g., 32 for H100) until ALL warps of the CTA have completed execution, only when will the slot be freed up for the next “unassigned” CTA.

Warp-to-SMSP

  • Within an SM, warps are distributed to one of the 4 SMSPs . Each warp is assigned to an SMSP for the lifetime of the warp. An assigned warp can never be “load-balanced” or “stolen” to another SMSP even within the same SM.

  • An active (aka resident) warp will keep occupying one of the max resident warps per SMSP quota (e.g., 64/4 =16 for H100) until the warp has completed, only when will the “slot” be freed up for the next inactive warp.

I think your assertions are generally a good mental model. A CTA should be thought of as permanently assigned to an SM (until it retires) for most considerations. Pre-emption does provide a mechanism by which an CTA could “move” from one SM to another. For this reason, the programming guide states that the smid special register value is not guaranteed to be the same for the lifetime of a threadblock.

Yes, an SM keeps using its slot until it fully retires.
Yes, a warp keeps using its slot until it fully retires.

I have experimentally convinced myself in the past that when a warp retires, even if its owning threadblock has not yet retired, that in some cases the resources used by that warp (e.g. registers) can become available for new CTA to be deposited on that SM.

1 Like

‘used to state’ → the warp would be restored to the same SM now?

When does preemption happen? During debug, operating system task switching, operating system hibernation?

1 Like

Thank you @Robert_Crovella for the very helpful info!

I am also curious about the preemption and when that would happen, thanks!

I didn’t say that. “used to state” means that in the past, the programming guide said a particular thing (and I linked to it, to show precisely what I am referring to), and now, it does not seem to say that thing (at least, I could not find it.) (see EDIT below).

I don’t have any further information. it was never well-specified to begin with. Furthermore, the programming guide has gone through a substantial rewrite recently - you don’t need to take my word for it, in my view it is self-evident.

As far as I know, its not specified anywhere. I would guess that debugging may use preemption. I would also guess that (“modern” time-sliced) context-switching involves preemption. In the past, I was fairly convinced that certain CDP 1.0 guarantees would require pre-emption in some cases, but that was just guesswork. And with CDP 2.0, I’m not sure if any mechanisms might use preemption. I don’t have any authoritative info about when preemption may be used. AFAIK it is not specified in any sort of exhaustive fashion anywhere.

EDIT:
I did locate it in the “new” programming guide, here. So no real change as far as inclusion goes. Note that text there:

The device runtime may reschedule thread blocks onto different SMs in order to more efficiently manage resources.

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.