NVIDIA ACE General Availability | Announcements

NVIDIA ACE — a suite of technologies for digital humans — is now generally available for developers building digital human services. ACE NIMs, or inference microservices, enable developers to deliver high-quality natural language understanding, speech synthesis, and facial animation for digital humans in gaming, customer service, healthcare and more.

To make it easier for developers to build digital humans, NVIDIA ACE 24.06 introduces general availability for many of the components within our suite of digital human technologies including Riva, Audio2Face and Omniverse RTX. They are available through NVIDIA AI Enterprise. Workflow examples can be found on NVIDIA NGC Catalog and the NVIDIA ACE GitHub repository including:

  • Riva ASR 2.15.1 - with a new English model with higher quality and accuracy.
  • Riva TTS 2.15.1 - with improved representation of German, European and Latin American Spanish, Mandarin Chinese, and Italian. Also included is the beta release of P-Flow, a fast and efficient flow-based model that can adapt to a new voice with very little data.
  • Riva NMT 2.15.1 - with a new 1.5B any-to-any translation model.
  • Audio2Face 1.011 - adds more blendshape customization options at runtime, supports more audio sampling rates, and improved lip sync and facial performance quality with Unreal Engine Metahuman characters.
  • Omniverse Renderer Microservice 1.0.0 - adds new animation data protocol and gRPC and HTTP endpoints.
  • Animation Graph Microservice 1.0.0 - adds support for character position and facial expression animations.
  • ACE Agent 4.0.0 - adds speech support for custom RAGs, Colang 2.0 support, and prebuilt support for example RAG workflows.

Microservices available in early access include:

  • Nemotron-3 4.5B SLM 0.1.0 - purpose-built for RTX AI PC inference and includes INT4 quantization for minimal VRAM usage.
  • Speech Live Portrait 0.1.0 - animates a person’s portrait photo using audio and supports lip sync, blinking and head pose animation.
  • VoiceFont 1.1.1 - reduces latency for real-time applications and supports four concurrent batches across all GPUs.

For additional information, please refer to our technical blog

You can also view our new ACE documentation.