Increase number of clients/contexts/processes and reduce context memory footprint in MPS

Currently, MPS supports up to 48 clients/contexts (CPU processes) since Volta. Does it make sense to increase this number? Not sure if the current number of processors/cores on CPU’s would be a bottleneck for supporting more clients. I’d think it might be. The bus bandwidth or MPS coordinator could also be bottlenecks. For my workloads, I do get better performance when increasing the number of clients. Thus, I am very interested in supporting more (ideally much more) than 48 clients. On a similar note, can we reduce the memory footprint for all the clients’ contexts? Maybe we can do deduplication and/or copy-on-write?

I filed a bug since it is a request but asking here in case anyone can answer at least some of it quickly.