Originally published at: NVIDIA Dynamo Adds GPU Autoscaling, Kubernetes Automation, and Networking Optimizations | NVIDIA Technical Blog
At NVIDIA GTC 2025, we announced NVIDIA Dynamo, a high-throughput, low-latency open-source inference serving framework for deploying generative AI and reasoning models in large-scale distributed environments. The latest v0.2 release of Dynamo includes: A planner for prefill and decode GPU autoscaling. Kubernetes automation for large-scale Dynamo deployments. Support for AWS Elastic Fabric Adaptor (EFA) for…