Hi, I am fairly new to CUDA and was wondering if I could check my understanding of using MIG-partitioned GPUs in an interactive multi-user system.
We recently deployed two new servers each with dual-A100 GPUs. These are to replace existing machines with 4xK80 GPUs. Benchmarking has suggested that that when fully partitioning an A100 into 7 MIG instances each instance is comparable to a K80 so we’re hoping for a significant increase in throughput capability.
The K80s were used interactively on a multi-user server, and we put them in ‘exclusive process’ mode. This meant that if one user starts a CUDA process requesting a single GPU it takes over that GPU and if a different process subsequently requests a GPU it will get a different one (assuming no more than 4 processes are using CUDA at once). So we never had to worry about users having to allocate GPUs to their processes.
With MIG, my understanding is that you cannot put them into exclusive process mode and the default behaviour is for any process requesting a GPU will simply be allocated the first MIG partition of the first GPU. Obviously this isn’t going to work for multiple users! The only solution seems to be for users to manually ‘check’ what GPUs are in use and then use CUDA_VISIBLE_DEVICES to allocate a suitable MIG instance. This seems a bit awkward, and we may also get problems with some third-party software which runs multiple CUDA processes in parallel - without modification I think this will not be able to distribute the processing over multiple MIG instances.
Is there any solution that I’m missing here? Right now the only things I can think of is a script to set CUDA_VISIBLE_DEVICES based on inspecting current running processes to identify ‘free’ MIG instances. Alternatively we could use a queuing system like Slurm but ideally we want to allow interactive use.
Thanks for any help anyone can offer - as said I’m no CUDA expert so it’s possible I’ve just misunderstood things or missed something obvious.