Deploying NVIDIA Triton at Scale with MIG and Kubernetes

jwitsoe · August 26, 2021, 3:00am

Originally published at: https://developer.nvidia.com/blog/deploying-nvidia-triton-at-scale-with-mig-and-kubernetes/

NVIDIA Triton can manage any number and mix of models (limited by system disk and memory resources). It also supports multiple deep-learning frameworks such as TensorFlow, PyTorch, NVIDIA TensorRT, and so on. This provides flexibility to developers and data scientists, who no longer have to use a specific model framework. NVIDIA Triton is designed to integrate easily with Kubernetes for large-scale deployment in the data center.

Topic		Replies	Views
Scaling LLMs with NVIDIA Triton and NVIDIA TensorRT-LLM Using Kubernetes Technical Blog llama	1	29	October 22, 2024
Autoscaling NVIDIA Riva Deployment with Kubernetes for Speech AI in Production Technical Blog	0	323	January 12, 2023
Horizontal Autoscaling of NVIDIA NIM Microservices on Kubernetes Technical Blog nim	3	32	January 24, 2025
Deploying GPT-J and T5 with FasterTransformer and Triton Inference Server Technical Blog	7	1009	April 19, 2023
Getting the Most Out of the NVIDIA A100 GPU with Multi-Instance GPU Technical Blog	11	1479	January 19, 2023
Tips for Building a RAG Pipeline with NVIDIA AI LangChain AI Endpoints Technical Blog	10	484	August 28, 2024
Improving GPU Utilization in Kubernetes Technical Blog	11	1985	September 25, 2024
Getting Kubernetes ready for the NVIDIA A100 GPU with Multi-Instance GPU Technical Blog	4	657	November 8, 2022
Serving ML Model Pipelines on NVIDIA Triton Inference Server with Ensemble Models Technical Blog	1	534	July 13, 2023
Deploying Models from TensorFlow Model Zoo Using NVIDIA DeepStream and NVIDIA Triton Inference Server Technical Blog	13	1181	May 25, 2022

Deploying NVIDIA Triton at Scale with MIG and Kubernetes

Related topics