GPU-Accelerated Speech to Text with Kaldi: A Tutorial on Getting Started

jwitsoe · October 17, 2019, 8:59pm

Originally published at: GPU-Accelerated Speech to Text with Kaldi: A Tutorial on Getting Started | NVIDIA Technical Blog

Recently, NVIDIA achieved GPU-accelerated speech-to-text inference with exciting performance results. That blog post described the general process of the Kaldi ASR pipeline and indicated which of its elements the team accelerated, i.e. implementing the decoder on the GPU and taking advantage of Tensor Cores in the acoustic model. Now with the latest Kaldi container on…

anon99649112 · October 31, 2019, 6:26pm

I would like to see a link to an article which describes what is needed to use the model in real time.

anon99255387 · November 1, 2019, 3:12pm

Do you mean as in streaming audio in real time? How many streams of audio would you have? This is something we are currently working on.

anon737887 · December 13, 2019, 1:03pm

I'm also interested in the about especially in voice related home automation

anon39377548 · February 28, 2020, 6:26pm

in the WAV format, shouldn't it be 16bit instead of 32bit float ?

upisipati · November 20, 2020, 9:01pm

Hi, I would like to know if the real-time streaming option is out yet ? If not, when is this going to be supported.

hbraun · November 21, 2020, 12:33am

Hi,

Yes, streaming is now fully supported. You can find more details there: https://developer.nvidia.com/gtc/2020/video/s21832-vid

Thanks,
Hugo

joepareti54 · March 6, 2021, 1:18pm

Assuming this forum is appropriate to discuss KALDI implementation issues. If not, I apologize.

I hit a roadblock when trying to use KALDI for a corpus of english-spanish language data using this code which seems to be taylored to Chinese.
More details in this status report There is a paragraph ‘discontinuing the project’ explaining the data preparation issue. I would appreciate any help on this