NVIDIA Dynamo Adds GPU Autoscaling, Kubernetes Automation, and Networking Optimizations

jwitsoe · May 20, 2025, 6:30pm

Originally published at: NVIDIA Dynamo Adds GPU Autoscaling, Kubernetes Automation, and Networking Optimizations | NVIDIA Technical Blog

At NVIDIA GTC 2025, we announced NVIDIA Dynamo, a high-throughput, low-latency open-source inference serving framework for deploying generative AI and reasoning models in large-scale distributed environments. The latest v0.2 release of Dynamo includes: A planner for prefill and decode GPU autoscaling. Kubernetes automation for large-scale Dynamo deployments. Support for AWS Elastic Fabric Adaptor (EFA) for…

Topic		Replies	Views
Introducing NVIDIA Dynamo, A Low-Latency Distributed Inference Framework for Scaling Reasoning AI Models Technical Blog	2	378	May 20, 2025
NVIDIA Dynamo Accelerates llm-d Community Initiatives for Advancing Large-Scale Distributed Inference Technical Blog	0	147	May 21, 2025
How NVIDIA Dynamo 1.0 Powers Multi-Node Inference at Production Scale Technical Blog agentic-ai	1	97	March 17, 2026
NVIDIA DYNAMO FAQ Announcements nim , llama , agentic-ai	0	274	March 18, 2025
NVIDIA DYNAMO FAQ Announcements nim , llama , agentic-ai	0	1925	March 18, 2025
Scaling LLMs with NVIDIA Triton and NVIDIA TensorRT-LLM Using Kubernetes Technical Blog llama	0	123	October 22, 2024
Disaggregated Prefill/Decode using NVIDIA Dynamo (Dual NVIDIA RTX PRO 6000 Blackwell) TensorRT	1	200	February 16, 2026
추론형 AI 모델을 위한 저지연 분산 추론 프레임워크, NVIDIA Dynamo 출시 Technical Blog - South Korea	0	82	May 16, 2025
Horizontal Autoscaling of NVIDIA NIM Microservices on Kubernetes Technical Blog nim	2	133	January 24, 2025
NVIDIA Dynamo, 대규모 분산 추론 발전을 위한 llm-d 커뮤니티 이니셔티브 가속화 Technical Blog - South Korea	0	134	May 26, 2025

NVIDIA Dynamo Adds GPU Autoscaling, Kubernetes Automation, and Networking Optimizations

Related topics