Serving ML Model Pipelines on NVIDIA Triton Inference Server with Ensemble Models

Originally published at:

Learn the steps to create an end-to-end inference pipeline with multiple models using NVIDIA Triton Inference Server and different framework backends.

Thanks for the detailed tutorial, very useful!

However, this doesn’t seem like an apple-to-apple comparison. What if we do the pre and post processing locally using GPU, then the latency should be the same?