Originally published at: https://developer.nvidia.com/blog/getting-real-time-factor-over-60-for-text-to-speech-using-jarvis/
Figure 1. The Jarvis Server and the TTS pipeline. NVIDIA Jarvis is an application framework that provides several pipelines for accomplishing conversational AI tasks. Generating high-quality, natural-sounding speech from text with low latency, also known as text-to-speech (TTS), can be one of the most computationally challenging of those tasks. In this post, we focus on…