Serving ML Model Pipelines on NVIDIA Triton Inference Server with Ensemble Models

Originally published at: https://developer.nvidia.com/blog/serving-ml-model-pipelines-on-nvidia-triton-inference-server-with-ensemble-models/

Learn the steps to create an end-to-end inference pipeline with multiple models using NVIDIA Triton Inference Server and different framework backends.

Thanks for the detailed tutorial, very useful!

However, this doesn’t seem like an apple-to-apple comparison. What if we do the pre and post processing locally using GPU, then the latency should be the same?