What is the good way to use MIG on a slurm cluster?

I apologize if this is the wrong subforum, it seemed to be one of the most likely at least…

Our HPC cluster (running slurm) was recently upgraded with a number of A100 cards, which we are now trying to get the most out of. That includes figuring out how to activate the ‘multiple instance GPU’ functionality. But, reading through NVIDIA Multi-Instance GPU User Guide :: NVIDIA Tesla Documentation, it seems there is an assumption of users with sudo rights?
If the admin has enabled MIG on each GPU, is it then possible for the users in their jobscripts to ‘activate’ 7 MIG 1g.5gb profiles, and then assign CUDA jobs to each profile?
right now, the closest we can get is first running a job with ‘nvidia-smi -L’ on the node , getting device ID’s (they look like ‘MIG-GPU-09156ffa-eece-6481-ce94-42ac07f27aa4/7/0’) and then running the ‘real’ jobscript with lines like

CUDA_VISIBLE_DEVICES=MIG-GPU-09156ffa-eece-6481-ce94-42ac07f27aa4/7/0 “CUDA job” &

but this seems like a very cumbersome workflow?

I would like to know the same. We are about to purchase a few servers with A100 cards and enable MIG licensing as most of the expected workloads would not be able to utilize the full potential of the A100. We make use of SLURM and would like to use GRES to identify the MIG instance as a GRES. @mikkelsen.kaare , have you been able to find a solution to your problem and willing to share your findings?

Hello @teejcee

The best we have come up with so far is a setup where the user switches either between ‘mig’ or ‘non mig’ use. The sys admin has defined ‘mig’ to be the max parrallel setup, with 7 devices. If a job is started with gres=gpu:mig, at the start of the jobscript, a call is made to nvidia-smi to get device IDs, and then the contents of a job array (defined in the jobscript) is spread across the 7 devices. This still means that the gpu is only serving a single user job, but it does make it possible to dynamically switch between max an min number of devices.

If your users are highly disciplined, slurm can be set to allow multiple jobs to run on the same node. If you use the ‘mig’ setup from above, and somehow coordinate which of the mig instances each user assigns tasks to, it is possible to have multiple users use different mig devices on simultaneously. However, this does not check whether the combined tasks exceed the memory of any given device, and seems to really just be a worse version of what slurm is supposed to do for us.

But, openpbs seem to already be cooking some mig compatibility into their system, so we’re hoping that slurm will be inspired to do the same :)