I can use MPS and the Stream-Ordered Async Memory Allocator independently on my machine with CUDA 11.7. However, I cannot combine the two. Specifically, cudaDevAttrMemoryPoolsSupported turns off after I initiate MPS (Multi-Process Service). If there is a logical issue preventing this from being supported, could I please know why? I’ve looked around and at the MPS and Async Allocator documentation but have not found anything saying these two cannot work together. It seems like these two could work very well together, with MPS potentially having control over all resources used by the processes. Maybe an additional layer would need to be added to coordinate between the processes’ memory utilization?
I see two possible avenues:
Each process/context uses its own memory pool(s)
MPS uses shared memory pool(s) among the processes
I think the latter could require code changes to the applications (ie. applications that explicitly use memory pools), but could be doable.
One thing to note is I am on a pre-Volta GPU (1080 ti). My understanding is >=Volta MPS fully isolates resources for the processes/clients/contexts. Can’t MPS work like a hypervisor that coordinates memory usage between the processes/clients/contexts, while still providing isolation? Sorry if I’m missing something. Would really appreciate hearing from someone with more knowledge of both features!
FWIW I ran a test on a V100, and the cudaDevAttrMemoryPoolsSupported shows as true even under MPS (CUDA 11.4). So my guess is the limitation has something to do with pre-volta MPS. I don’t have any further information.
If you’d like to see a change to CUDA or CUDA docs, you can always file a bug.
Thanks for the quick reply @Robert_Crovella ! I have filed it as a bug a couple weeks ago, though it sounds like the functionality already exists if I upgrade my GPU which I was already planning on doing.
Checking I see the same on a V100 soon.
Just checking, consumer-grade GPU’s (ie. GTX/RTX 30 and 40 series; Turing, Ampere, Ada) since the Volta architecture on enterprise grade GPU’s (ie. V100) will also support the same functionality (MPS+Memory Pools), right?
I personally have never witnessed any difference in MPS behavior between consumer and commercial/datacenter GPUs. However most of my testing is on datacenter GPUs. This particular wrinkle isn’t published anywhere that I know of.
I can’t tell you what to expect. On datacenter GPUs, volta and beyond, the indication I have is that you can use memory pools with MPS. You can file a bug for doc improvements.