Triton Inference Server, Model Analyzer

uss0403 · March 4, 2024, 9:34am

Hello, NVIDIA

I am utilizing the excellent features of the NVIDIA Triton Inference Server. In particular, I am utilizing the Model Analyzer in Triton Inference Server to estimate the approximate throughput for models when using GPUs.

I find it very convenient as it gives different results and plots depending on which GPU it is run on. However, when performing many configuration options for a specific model, it can sometimes take several hours to 1-2 days.

Therefore, I was wondering if there is a way to obtain numerical results without running it on a GPU. Is it absolutely necessary to perform this kind of task?

Topic		Replies	Views
Simplifying AI Inference with NVIDIA Triton Inference Server from NVIDIA NGC Technical Blog	3	462	October 29, 2020
Get Started on NVIDIA Triton with an Introductory Course from NVIDIA DLI Technical Blog	1	562	August 15, 2024
Triton inference server dynamic load TensorRT inference-server-triton	0	121	July 18, 2024
Accelerating the Wide & Deep Model Workflow from 25 Hours to 10 Minutes Using NVIDIA GPUs Technical Blog	1	394	May 6, 2021
Inference speed of Triton Server Triton Inference Server - archived tensorrt , python , inference-server-triton	0	631	December 19, 2023
List of available models in Model control mode Triton Inference Server - archived	0	451	March 6, 2020
Real Time Inference with Multi GPU - Multiple Model Triton Inference Server - archived	1	1386	January 29, 2020
Inference on video/audio streams in Triton Triton Inference Server - archived inference-server-triton	1	1806	September 30, 2021
NVIDIA Triton Inference Server Achieves Outstanding Performance in MLPerf Inference 4.1 Benchmarks Technical Blog	1	27	August 28, 2024
Estimating inference and training time of a neural network on GPU Maxine	2	2644	February 5, 2022

Triton Inference Server, Model Analyzer

Related topics