How to Select a Specific MIG Instance in the Kubernetes Pod

Sedighi · September 15, 2024, 3:24pm

Hello NVIDIA Community,

I am using a MIG-enabled A100 GPU on a Kubernetes cluster and installed the NVIDIA device plugin for Kubernetes. However, I am encountering an issue where the Kubernetes scheduler doesn’t honour the specific MIG instance I intend to run on for my application. My challenge is ensuring that a specific MIG instance is selected for a pod. Outside of Kubernetes, I can use CUDA_VISIBLE_DEVICES to specify the MIG instance like this:

CUDA_VISIBLE_DEVICES=MIG-GPU-e88cb44c-6756-fd30-cd4a-1e6da3ca88b0 ./application

However, when I request a MIG instance using resource limits in the pod YAML file, such as:

resources:
  limits:
    nvidia.com/mig-1g.5gb: 1

Even though I set the CUDA_VISIBLE_DEVICES environment variable:

env:
  - name: CUDA_VISIBLE_DEVICES
    value: "MIG-GPU-e88cb44c-6756-fd30-cd4a-1e6da3ca88b0"

The results of the ’ kubectl exec -it gpu-pod – nvidia-smi -L’ show that the scheduler still assigns the pod to the first available MIG instance without respecting the specific MIG ID I provided. I’m not sure if this is an issue with the NVIDIA device plugin or if it’s related to how the Kubernetes scheduler handles MIG instances.

Has anyone encountered a similar issue or found a solution to ensure that the correct MIG instance is assigned to a pod in Kubernetes when using CUDA_VISIBLE_DEVICES? Any suggestions or insights would be greatly appreciated!

Thanks for your help!

Sedighi · September 24, 2024, 9:08pm

Hi everyone,

After some extensive searching and testing, I found that the issue I was facing with specifying a specific MIG instance in my Kubernetes pod was resolved by using NVIDIA_VISIBLE_DEVICES instead of CUDA_VISIBLE_DEVICES.

Difference Between Variables

CUDA_VISIBLE_DEVICES: This environment variable is commonly used in traditional CUDA applications to control which GPUs are visible to your application. However, it does not directly handle GPU resources in a Kubernetes environment, especially with MIG instances.
NVIDIA_VISIBLE_DEVICES: This variable is specific to the NVIDIA container toolkit and is designed for use in containerized environments like Kubernetes. It manages GPU access more effectively and allows you to specify which MIG instances should be visible to your pod.

Solution
To ensure that a specific MIG instance is selected for my application, I updated my pod’s YAML file to include the NVIDIA_VISIBLE_DEVICES . By doing this, I was able to control which MIG instance my pod used, resolving the initial issue where the Kubernetes scheduler assigned the pod to the first available MIG instance regardless of the specific instance ID I provided.

I hope this helps anyone facing a similar challenge!

Topic		Replies	Views
Reconfiguration using NVIDIA MIG Manager For Kubernetes Docker and NVIDIA Docker cuda , kubernetes , cloud , ngc	0	293	June 12, 2025
Set CUDA_VISIBLE_DEVICES to run kernels on specific MIG instance CUDA Programming and Performance	1	271	December 12, 2024
Understanding How CUDA_VISIBLE_DEVICES Works CUDA Programming and Performance	3	508	July 15, 2025
How to use CUDA_VISIBLE_DEVICES for MIG instances CUDA Programming and Performance	4	6677	November 15, 2021
Getting Kubernetes ready for the NVIDIA A100 GPU with Multi-Instance GPU Technical Blog	4	782	November 8, 2022
MIGs do not show, despite being created CUDA Setup and Installation	2	738	October 15, 2024
MIG-GPU Support in Kubernetes TAO Toolkit	8	703	June 26, 2022
How to use cuda api programming to select MIG devices CUDA Programming and Performance	1	211	November 19, 2024
How to use cudaSetDevice to select devices while using MIG on A100? CUDA Programming and Performance	1	2051	September 28, 2021
Is supported multiple MIG instances with R550 & CUDA12.1? JAX cuda	2	161	June 5, 2025

How to Select a Specific MIG Instance in the Kubernetes Pod

Related topics