Limit number of (or allocate) SM on a per stream basis

The partitioning of SMs can be achieved using MPS resource provisioning. This implies that the work be broken into separate processes. For a single process, the only methodologies are the ones already mentioned, the primary one being stream priorities. (Another effective method is probably to use 2 or more GPUs.) I have made suggestions about how to use stream priorities to give best progress to the high priority stream here. I don’t have any further suggestions. It’s quite possible these suggestions don’t address every case.