Simplifying AI Inference with NVIDIA Triton Inference Server from NVIDIA NGC

jwitsoe · August 25, 2020, 12:12am

Originally published at: https://developer.nvidia.com/blog/simplifying-ai-inference-with-nvidia-triton-inference-server-from-nvidia-ngc/

Seamlessly deploying AI services at scale in production is as critical as creating the most accurate AI model. Conversational AI services, for example, need multiple models handling functions of automatic speech recognition (ASR), natural language understanding (NLU), and text-to-speech (TTS) to complete the application pipeline. To provide real-time conversation to users, such applications should be…

jamess · August 25, 2020, 4:16am

Try building your own AI application leveraging Triton Inference Server today, and let us know of any questions or concerns!

karlmutch · September 10, 2020, 6:11pm

Hi, do you have any materials giving a comparison with TensorFlow TFX model serving ?

Voxelworld · October 29, 2020, 2:05pm

Hi, I’m looking for an inference server to provide access to experimental models in our R&D department. Therefore, speed is not the top priority.
Do all models have to fit in the GPU memory at the same time? Or are models unloaded and reloaded if necessary?
In our scenario, many AI models (and old versions) are provided which in total would require more GPU memory than available, but a reload delay of a model which isn’t active would be tolerable at inference time.

Topic		Replies	Views
Simplifying and Scaling Inference Serving with NVIDIA Triton 2.3 Technical Blog	0	413	October 5, 2020
Solving AI Inference Challenges with NVIDIA Triton Technical Blog	0	389	September 21, 2022
NVIDIA TensorRT Inference Server and Kubeflow Make Deploying Data Center Inference Simple Technical Blog	0	292	August 21, 2022
Fast and Scalable AI Model Deployment with NVIDIA Triton Inference Server Technical Blog	0	422	November 9, 2021
Simplifying AI Inference in Production with NVIDIA Triton Technical Blog	3	710	November 19, 2021
One-click Deployment of Triton Inference Server to Simplify AI Inference on Google Kubernetes Engine (GKE) Technical Blog	0	524	August 23, 2021
Deploying AI Deep Learning Models with NVIDIA Triton Inference Server Technical Blog	0	400	December 18, 2020
NVIDIA Triton Inference Server Achieves Outstanding Performance in MLPerf Inference 4.1 Benchmarks Technical Blog	1	28	August 28, 2024
NVIDIA Triton Inference Server Boosts Deep Learning Inference Technical Blog	0	288	August 21, 2022
Create Custom Character Detection and Recognition Models with NVIDIA TAO, Part 2 Technical Blog	0	411	August 15, 2023

Simplifying AI Inference with NVIDIA Triton Inference Server from NVIDIA NGC

Related topics