Generate Natural Sounding Speech from Text in Real-Time

jwitsoe · September 10, 2019, 10:56pm

Originally published at: Generate Natural Sounding Speech from Text in Real-Time | NVIDIA Technical Blog

This post, intended for developers with professional level understanding of deep learning, will help you produce a production-ready, AI, text-to-speech model. Converting text into high quality, natural-sounding speech in real time has been a challenging conversational AI task for decades. State-of-the-art speech synthesis models are based on parametric neural networks1. Text-to-speech (TTS) synthesis is typically…

anon26787984 · September 25, 2019, 11:02am

You state "Our current model synthesizes samples at 125 * 22,050 = 2,756,250, which is 125 times faster than “real-time” at 22,050 samples", why RTF is then not 125 instead of 1-4 ?

anon25443301 · November 15, 2019, 4:44pm

Guys, hope you could correct the use of the term RTF(pls do not mix with xRTF which is 1/RTF), we do not like RTF > 1 systems which means it could not be real-time http://dictionary.sensagent...

anon47728893 · January 28, 2020, 3:45pm

there are two factors that influence the latency results reported here: 1) we are measuring end-to-end text-to-speech inference, i.e., the total of Tacotron2 and WaveGlow latency is reported; in the quoted sentence, the 125 refers to WaveGlow latency only. 2) In this article we were using the slower version of WaveGlow with 512 residual channels; the quoted version uses 256 channels.

anon39690650 · April 27, 2020, 7:24pm

This is a real good article, thank you.

Topic		Replies	Views
Getting a Real Time Factor Over 60 for Text-To-Speech Services Using NVIDIA Jarvis Technical Blog	0	449	August 25, 2020
GPU-Accelerated Speech to Text with Kaldi: A Tutorial on Getting Started Technical Blog	7	908	March 6, 2021
NGC SpeechSynthesis(Tacotron2) example's expected training time is not clear in the documentation Docker and NVIDIA Docker	0	620	April 17, 2019
Offline speech synthesis - TTS Jetson Xavier NX	4	1715	October 18, 2021
Nv-Wavenet: Better Speech Synthesis Using GPU-Enabled WaveNet Inference Technical Blog	2	399	May 6, 2018
Tao speech_to_text evaluate+infer show very weak results TAO Toolkit	26	2228	March 8, 2022
Understanding Natural Language with Deep Neural Networks Using Torch Technical Blog	18	528	September 26, 2016
How to Deploy Real-Time Text-to-Speech Applications on GPUs Using TensorRT Technical Blog	0	426	August 25, 2020
Optimizing T5 and GPT-2 for Real-Time Inference with NVIDIA TensorRT Technical Blog	4	1359	March 21, 2022
Training Your Own Voice Font Using Flowtron Technical Blog	0	728	October 3, 2020

Generate Natural Sounding Speech from Text in Real-Time

Related topics