Run Multiple AI Models on the Same GPU with Amazon SageMaker Multi-Model Endpoints Powered by NVIDIA Triton Inference Server

jwitsoe · October 25, 2022, 6:00pm

Originally published at: https://developer.nvidia.com/blog/run-multiple-ai-models-on-same-gpu-with-sagemaker-mme-powered-by-triton/

Last November, AWS integrated open-source inference serving software, NVIDIA Triton Inference Server, in Amazon SageMaker. Machine learning (ML) teams can use Amazon SageMaker as a fully managed service to build and deploy ML models at scale. With this integration, data scientists and ML engineers can easily use the NVIDIA Triton multi-framework, high-performance inference serving with…

Topic		Replies	Views
Fast and Scalable AI Model Deployment with NVIDIA Triton Inference Server Technical Blog	0	467	November 9, 2021
Accelerating AI and ML Workflows with Amazon SageMaker and NVIDIA NGC Technical Blog	0	408	August 25, 2020
Simplifying and Scaling Inference Serving with NVIDIA Triton 2.3 Technical Blog	0	454	October 5, 2020
Deploying AI Deep Learning Models with NVIDIA Triton Inference Server Technical Blog	0	443	December 18, 2020
Amazon Elastic Kubernetes Services Now Offers Native Support for NVIDIA A100 Multi-Instance GPUs Technical Blog	0	371	October 22, 2021
Serving ML Model Pipelines on NVIDIA Triton Inference Server with Ensemble Models Technical Blog	1	605	July 13, 2023
Simplifying AI Inference in Production with NVIDIA Triton Technical Blog	3	786	November 19, 2021
Identifying the Best AI Model Serving Configurations at Scale with NVIDIA Triton Model Analyzer Technical Blog	0	442	May 23, 2022
Solving AI Inference Challenges with NVIDIA Triton Technical Blog	0	437	September 21, 2022
How to Deploy an AI Model in Python with PyTriton Technical Blog	1	641	January 4, 2024

Run Multiple AI Models on the Same GPU with Amazon SageMaker Multi-Model Endpoints Powered by NVIDIA Triton Inference Server

Related topics