How does dynamic reassignment of register capacity among warp-groups on Hopper cards work?

In CUDA C++ Programming Guide the guide mentions dynamic reassignment of register capacity among warp-groups to support even larger matrices. I didn’t see any special instructions for this in the PTX guide and am looking for more info on how this works. I got access to a HGX machine and will be optimizing ML models on it in the future, so this topic is something I need to familiarize myself with.

Any help would be appreciated.

You would use the PTX instruction setmaxnreg: PTX ISA 8.5

See the explanation by Greg in this post: