Getting Kubernetes ready for the NVIDIA A100 GPU with Multi-Instance GPU

Originally published at: https://developer.nvidia.com/blog/getting-kubernetes-ready-for-the-a100-gpu-with-multi-instance-gpu/

Multi-Instance GPU (MIG) is a new feature of the latest generation of NVIDIA GPUs, such as A100. It enables users to maximize the utilization of a single GPU by running multiple GPU workloads concurrently as if there were multiple smaller GPUs. MIG supports running multiple workloads in parallel on a single A100 GPU or allowing…

Hello,
I tried the instructions in the blog, it failed at the command:
‘sudo helm install --version=0.13.0-rc.2 --generate-name --set migStrategy=none nvdp/k8s-device-plugin’
I have 0.13.0.-rc.2 devic-plugin, but it reported the following error:
‘INSTALLATION FAILED: chart “k8s-device-plugin” matching 0.13.0-rc.2 not found in nvdp index’

how to continue from here?

thanks,

Hello,
NVIDIA has changed the version names recently - please follow this guide

to install the current latest version (v0.12.3)
Thanks!

The 0.13.0-rc.2 version is a release candidate. The full version of 0.13.0 will be released in the next couple of weeks.

If you really want to run with 0.13.0-rc.2, then you can just add --devel to your helm command line so it can find release candidate versions like this.

I got it run with GPU-operator. It works to remotely config MIG. I only applied ‘single’ policy for now.
thanks,