AI Reasoning with Llama Nemotron at GTC25 | Announcements

We hope you got a chance to watch NVIDIA CEO Jensen Huang’s keynote at GTC.

Today, NVIDIA announced NVIDIA Llama Nemotron, an open family of leading AI models that deliver exceptional reasoning capabilities, compute efficiency, and an open license for enterprise use.

The family comes in three sizes, providing developers with the right model size based on their use case, compute availability, and accuracy requirements.

  • Nano: 8B distilled from Llama 3.1 8B for highest accuracy on PC and edge.
  • Super: 49B distilled from Llama 3.3 70B for best accuracy with highest throughput on a data center GPU. This model is the focus of this post.
  • Ultra: 253B distilled from Llama 3.1 405B for maximum agentic accuracy on multi-GPU data center servers (coming soon).

The Llama Nemotron with reasoning models provide best-in-class accuracy across industry-standard reasoning and agentic benchmarks: GPQA Diamond, AIME 2025, MATH 500, and BFCL, as well as Arena Hard.

Read more in our developer blog.

Stay up-to-date with the latest NVIDIA AI announcements by following us on Discord, Instagram, YouTube, and X.