Is it possible to virtualize GPU into smaller vGPU (similar to what companies like VMWare do for CPU) to allow multiple applications to run together?
Is MPS the only solution for this? But MPS have a limitation that it allows only one cudaMemcpy() (in one direction) at a time right?
VMware has supported virtualization of GPUs for a number of years. Check with them what specific hardware they support. Here is a recent blog entry that may be relevant:
Machine Learning using Virtualized GPUs on VMware vSphere