Minimizing Deep Learning Inference Latency with NVIDIA Multi-Instance GPU

Originally published at: https://developer.nvidia.com/blog/minimizing-dl-inference-latency-with-mig/

Recently, NVIDIA unveiled the A100 GPU model, based on the NVIDIA Ampere architecture. Ampere introduced many features, including Multi-Instance GPU (MIG), that play a special role for deep learning-based (DL) applications. MIG makes it possible to use a single A100 GPU as if it were multiple smaller GPUs, maximizing utilization for DL workloads and providing…