Serving ML Model Pipelines on NVIDIA Triton Inference Server with Ensemble Models

jwitsoe · March 13, 2023, 2:00pm

Originally published at: https://developer.nvidia.com/blog/serving-ml-model-pipelines-on-nvidia-triton-inference-server-with-ensemble-models/

Learn the steps to create an end-to-end inference pipeline with multiple models using NVIDIA Triton Inference Server and different framework backends.

flexwang · July 13, 2023, 10:32pm

Thanks for the detailed tutorial, very useful!

However, this doesn’t seem like an apple-to-apple comparison. What if we do the pre and post processing locally using GPU, then the latency should be the same?

Topic		Replies	Views
Fast and Scalable AI Model Deployment with NVIDIA Triton Inference Server Technical Blog	0	426	November 9, 2021
Simplifying and Scaling Inference Serving with NVIDIA Triton 2.3 Technical Blog	0	414	October 5, 2020
Deploying GPT-J and T5 with FasterTransformer and Triton Inference Server Technical Blog	7	1017	April 19, 2023
Solving AI Inference Challenges with NVIDIA Triton Technical Blog	0	398	September 21, 2022
Identifying the Best AI Model Serving Configurations at Scale with NVIDIA Triton Model Analyzer Technical Blog	0	405	May 23, 2022
NVIDIA TensorRT-LLM 및 NVIDIA Triton Inference Server로 Meta Llama 3 성능 강화 Technical Blog - South Korea	1	286	May 3, 2024
Develop ML and AI with Metaflow and Deploy with NVIDIA Triton Inference Server Technical Blog	2	362	January 5, 2024
Turbocharging Meta Llama 3 Performance with NVIDIA TensorRT-LLM and NVIDIA Triton Inference Server Technical Blog	62	3684	August 28, 2024
How to Deploy an AI Model in Python with PyTriton Technical Blog	1	588	January 4, 2024
Deploying Diverse AI Model Categories from Public Model Zoo Using NVIDIA Triton Inference Server Technical Blog	6	691	September 11, 2023

Serving ML Model Pipelines on NVIDIA Triton Inference Server with Ensemble Models

Related topics