Getting the Most Out of the NVIDIA A100 GPU with Multi-Instance GPU

Originally published at: https://developer.nvidia.com/blog/getting-the-most-out-of-the-a100-gpu-with-multi-instance-gpu/

With the third-generation Tensor Core technology, NVIDIA recently unveiled A100 Tensor Core GPU that delivers unprecedented acceleration at every scale for AI, data analytics, and high-performance computing. Along with the great performance increase over prior generation GPUs comes another groundbreaking innovation, Multi-Instance GPU (MIG). With MIG, each A100 GPU can be partitioned up to seven…

I have followed all the instructions referred in the MIG Manual , however, when I run “sudo nvidia-smi mig -cgi 9,3g.20gb -C”, it turns out to be
Option “-C” is not recognized.
How should I solve this problem?
And without the “-C” option, though I can find the MIGs by “nvidia-smi mig -lgi”, but neither can I get it through “nvidia-smi” nor “ls -l /proc/driver/nvidia/capabilities/gpu1/mig/gi*”
What should I do with this problem?

Hi ryy19

Option “-C” is not recognized.

As mentioned in the software pre-requisites, are you running at least R450.80.02 as the driver version for A100? The “-C” option is only available starting with this driver version.

but neither can I get it through “nvidia-smi” nor “ls -l /proc/driver/nvidia/capabilities/gpu1/mig/gi*”

Can you please provide more information on what you’re not able to see? MIG devices once created can be accessed either through “nvidia-smi -L” or “nvidia-smi mig -lgi”

hi, is there a good way for users without sudo rights to use the MIG functionality? I think running multiple scripts in parallel on the same A100 sounds very interesting, but it needs to work without admin rights (at least after the admin has enabled MIG on the GPU).

is there a way to do that?

Hi @mikkelsen.kaare - not today. We expect that clusters with A100 GPUs are configured in desired MIG geometries - the configurations can be static (a priori by the infra team) or dynamic (using a systemd service for example as nodes are brought online when used in an autoscaler environment). We have created tooling that can be used for these purposes.

Please check this project for a declarative way to create the desired MIG geometries: https://github.com/nvidia/mig-parted and the associated systemd service that can be used in conjunction with provisioning nodes: https://github.com/NVIDIA/mig-parted/tree/master/deployments/systemd. We expect that these tools be used instead of nvidia-smi commands, which can be error prone when used in a production environment. Hope these are useful.

1 Like

Hi,
whether the A100 GPU with Multi-Instance GPU (MIG) allow users to set the application clock (graphic or memory) for a specific GPU instance? Or when we set the application clock via Nvidia-smi, it applies to all instances within the GPU.

Hi @kz181, it should apply to all MIG instances, as all MIG instances share a single clock and power limit.