Originally published at: https://developer.nvidia.com/blog/practical-strategies-for-optimizing-llm-inference-sizing-and-performance/
As the use of large language models (LLMs) grows across many applications, such as chatbots and content creation, it’s important to understand the process of scaling and optimizing inference systems to make informed decisions about hardware and resources for LLM inference. In the following talk, Dmitry Mironov and Sergio Perez, senior deep learning solutions architects…
Hi,
Where can I get the sizing tool mentioned during the training/presentation?
https://nemo-inference-sizing.nvidia.com/
The above link seems not to be working. Was this project cancelled? Is there a new tool?
Thanks!