Practical Strategies for Optimizing LLM Inference Sizing and Performance

jwitsoe · August 21, 2024, 4:00pm

Originally published at: https://developer.nvidia.com/blog/practical-strategies-for-optimizing-llm-inference-sizing-and-performance/

As the use of large language models (LLMs) grows across many applications, such as chatbots and content creation, it’s important to understand the process of scaling and optimizing inference systems to make informed decisions about hardware and resources for LLM inference. In the following talk, Dmitry Mironov and Sergio Perez, senior deep learning solutions architects…

mulcas · June 30, 2025, 3:08pm

Hi,

Where can I get the sizing tool mentioned during the training/presentation?

https://nemo-inference-sizing.nvidia.com/

The above link seems not to be working. Was this project cancelled? Is there a new tool?

Thanks!

Topic		Replies	Views
LLM 추론 크기 조정 및 성능 최적화를 위한 실용적인 전략 Technical Blog - South Korea	1	62	August 23, 2024
Top Inference for Large Language Models Sessions at NVIDIA GTC 2024 Technical Blog	1	256	February 13, 2024
Scaling LLMs with NVIDIA Triton and NVIDIA TensorRT-LLM Using Kubernetes Technical Blog llama	1	117	October 22, 2024
Optimizing Inference on Large Language Models with NVIDIA TensorRT-LLM, Now Publicly Available Technical Blog	8	2042	January 25, 2024
LLM Performance Benchmarking: Measuring NVIDIA NIM Performance with GenAI-Perf Technical Blog nim , llama	1	146	May 6, 2025
Free Digital Webinar Series: How to Get Started with AI Inference Technical Blog	1	293	January 11, 2024
NVIDIA TensorRT-LLM Supercharges Large Language Model Inference on NVIDIA H100 GPUs Technical Blog	5	1197	September 27, 2023
Benchmarking LLM Inference Costs for Smarter Scaling and Deployment Technical Blog	1	117	June 25, 2025
Optimize AI Inference Performance with NVIDIA Full-Stack Solutions Technical Blog	1	110	January 24, 2025
Demystifying AI Inference Deployments for Trillion Parameter Large Language Models Technical Blog	3	273	April 17, 2025

Practical Strategies for Optimizing LLM Inference Sizing and Performance

Related topics