Getting the Most Out of the NVIDIA A100 GPU with Multi-Instance GPU

jwitsoe · December 1, 2020, 12:28am

Originally published at: https://developer.nvidia.com/blog/getting-the-most-out-of-the-a100-gpu-with-multi-instance-gpu/

With the third-generation Tensor Core technology, NVIDIA recently unveiled A100 Tensor Core GPU that delivers unprecedented acceleration at every scale for AI, data analytics, and high-performance computing. Along with the great performance increase over prior generation GPUs comes another groundbreaking innovation, Multi-Instance GPU (MIG). With MIG, each A100 GPU can be partitioned up to seven…

ryy19 · December 20, 2020, 10:48am

I have followed all the instructions referred in the MIG Manual , however, when I run “sudo nvidia-smi mig -cgi 9,3g.20gb -C”, it turns out to be
Option “-C” is not recognized.
How should I solve this problem?
And without the “-C” option, though I can find the MIGs by “nvidia-smi mig -lgi”, but neither can I get it through “nvidia-smi” nor “ls -l /proc/driver/nvidia/capabilities/gpu1/mig/gi*”
What should I do with this problem?

P_Ramarao · January 7, 2021, 8:49pm

Hi ryy19

Option “-C” is not recognized.

As mentioned in the software pre-requisites, are you running at least R450.80.02 as the driver version for A100? The “-C” option is only available starting with this driver version.

but neither can I get it through “nvidia-smi” nor “ls -l /proc/driver/nvidia/capabilities/gpu1/mig/gi*”

Can you please provide more information on what you’re not able to see? MIG devices once created can be accessed either through “nvidia-smi -L” or “nvidia-smi mig -lgi”

mikkelsen.kaare · March 18, 2021, 8:17pm

hi, is there a good way for users without sudo rights to use the MIG functionality? I think running multiple scripts in parallel on the same A100 sounds very interesting, but it needs to work without admin rights (at least after the admin has enabled MIG on the GPU).

is there a way to do that?

P_Ramarao · March 23, 2021, 12:29am

Hi @mikkelsen.kaare - not today. We expect that clusters with A100 GPUs are configured in desired MIG geometries - the configurations can be static (a priori by the infra team) or dynamic (using a systemd service for example as nodes are brought online when used in an autoscaler environment). We have created tooling that can be used for these purposes.

Please check this project for a declarative way to create the desired MIG geometries: https://github.com/nvidia/mig-parted and the associated systemd service that can be used in conjunction with provisioning nodes: https://github.com/NVIDIA/mig-parted/tree/master/deployments/systemd. We expect that these tools be used instead of nvidia-smi commands, which can be error prone when used in a production environment. Hope these are useful.

kz181 · April 7, 2021, 7:30pm

Hi,
whether the A100 GPU with Multi-Instance GPU (MIG) allow users to set the application clock (graphic or memory) for a specific GPU instance? Or when we set the application clock via Nvidia-smi, it applies to all instances within the GPU.

maggiez · April 13, 2021, 10:34pm

Hi @kz181, it should apply to all MIG instances, as all MIG instances share a single clock and power limit.

Daniel_Wong · August 18, 2021, 3:15pm

is it possible to enumerate multiple MIG compute instances?

For example, can I pass the UUIDs of multiple compute instances of MIG as the CUDA_VISIABLE_DEVICES or --gpus for the docker, such that my program or docker container can find those MIG GPU devices and using the cudaSetDevice to index them by number, such as 0,1,2 for three different compute instances?

Thanks!

user114155 · August 11, 2022, 2:01am

Under “single” strategy, “num_gpus” doesn’t work. It always uses one MIG device.
python tf_cnn_benchmarks.py --num_gpus=2 --batch_size=64 --model=resnet50 --use_fp16

hsuvarna_2000 · November 15, 2022, 7:34pm

Hi,
When multiple users log into the same A100 (linux ssh), how to allocate the MIG GPUs to each so that one user does not step on to other user’s GPU slices? Lets say we use 3g.20GB i.e. each A100 GPU is split into 2 slices. So totally 16 slices are available. There are device ids now uuids. Is there a way to allocate devices to individual users? @maggiez @chetantekur

hsuvarna_2000 · November 15, 2022, 7:40pm

Is MIG meant for only docker containers? Can multiple users ssh directly to the VM and use ?

tianyu9748 · January 19, 2023, 6:50pm

CUDA_VISIBLE_DEVICES, I am not sure whether this could help you.

Topic		Replies	Views
MIG performance CUDA Programming and Performance	15	1164	November 28, 2024
Dividing NVIDIA A30 GPUs and Conquering Multiple Workloads Technical Blog	0	387	August 30, 2022
ISC20 Featured Demo: Running Multiple Workloads on a Single A100 GPU Technical Blog	0	335	November 15, 2021
ISC20 Featured Demo: Boosting Performance and Utilization with Multi-Instance GPU Technical Blog	0	306	August 21, 2022
Getting Kubernetes ready for the NVIDIA A100 GPU with Multi-Instance GPU Technical Blog	4	740	November 8, 2022
Multi Instance GPU (MIG) mode and Performance CUDA Programming and Performance	2	907	August 1, 2022
How to use MIG technology to divide computing units in Thor？ Jetson Thor	11	470	October 2, 2025
Docker doesn't detect MIG gpu devices DGX Systems (Data Center) docker	7	4220	May 11, 2023
Minimizing Deep Learning Inference Latency with NVIDIA Multi-Instance GPU Technical Blog	1	548	April 29, 2022
Multi-Instance GPU (MIG) feature for Jetson AGX Thor and Drive AGX Thor? Jetson Thor	2	55	January 14, 2026

Getting the Most Out of the NVIDIA A100 GPU with Multi-Instance GPU

Related topics