Whether Ampere Warp Schedular is Static or Dynamic

This is stemming from these articles:

  1. *GPU architecture and warp scheduling - CUDA / CUDA Programming and Performance - NVIDIA Developer Forums where the moderator is hesitant to disclose details on warp scheduler.

  2. 1804.06826 (arxiv.org) where Volta Warp scheduler is reverse-engineered to derive a static warp scheduling policy.

My question is, (if the information is public) does the 4 warp schedulers inside an Ampere SM have independent warp pools to select and issue per warp scheduler (static)? Or do the four of them operate on a pool of warps mapped to that SM (dynamic)?

I am trying to program a CUDA kernel with a fixed warp-SM subpartition mapping, so that I can be sure warps with certain warp IDs will map to certain SMSPs.

Later in that thread you link, documentation support for static assignment of warps to schedulers is pointed out (I just didn’t have the doc reference handy at that point, so I did not know if/where it was documented.)

Specifically here (for Ampere, there are similar statements e.g. in the Volta section):

An SM statically distributes its warps among its schedulers.

I am generally hesitant to disclose “details” on nearly any subject unless I have some confidence that it is documented somewhere (or can be inferred based on testing that a reasonably competent programmer could arrive at.)

However, later in that thread, when presented with the docs, I said:

I certainly would like to retract my statement about warp assignment.

Thank you for the prompt answer.

If my understanding is correct,

  1. yes, the warp scheduler for Ampere is static
  2. but exact details of the scheduling scheme are undisclosed.

That’s the first time in this thread that I was aware of any interest other than whether the assignment of warps was static or dynamic. I don’t think any statements were made about the scheduling scheme, so not sure what understanding about the scheduling scheme could be sensibly drawn from this thread.

Moving on, what, do you suppose, is undisclosed?

If the warp scheduler has 1 or more assigned warps that are not stalled, it will choose one of those, at each scheduling opportunity. That seems uncontroversial, although I would not be able to put my finger on chapter/verse of documentation. As I mentioned in another thread, I view much of this low-level behavior as an implementation detail for the CUDA programmer, so it is probably intentionally unspecified.

The only thing undisclosed that I can think of is that if it has several unstalled warps, which of those will it choose. I don’t know that that is specified anywhere. We could imagine for (I believe fairly sufficient) knowledge/instruction of the CUDA programmer, that the choice is made “at random”. It would be ninja-level programming to be sure, to imagine that anything other than that description is useful.

Naturally these things I write, like everything I write or say, is my opinion.

Later: Or perhaps you mean that the assignment policy is undisclosed? i.e. how are warps assigned to the schedulers?

I would agree that is generally undisclosed, AFAIK, although I would assume that the assignment is done to approximately “evenly” distribute warps.

1 Like