Best way of sharing a GPU among multiple ranks.

I am working on porting a portion of a bioinformatics application to GPUs. I am running this application on a large scale supercomputer so I am dealing with multiple ranks (processes) and GPUs on each node. I have a scenario where multiple ranks are launching kernels on the same GPU, this works fine if the total global memory usage by all the ranks combined remains below the GPU’s limit. But when this limit exceeds, the application crashes with GPU out of memory error. My question is, what is the best way of handling this issue? is there something that can allow queuing of memory allocation calls and kernel launches? I understand that Nvidia’s MPS handles the kernel calls by queueing them till resources become available but is it possible to do the same for cudaMalloc calls?

Thanks.

There is nothing that resolves this scenario for cudaMalloc calls.

You will need to manually assign ranks to each GPU, and do so in a way that observes the memory limit of the GPU and makes sure that only as many ranks as the memory can support are using that GPU.

Thank you. I will try something like that.