NVIDIA Accelerates Real Time Speech to Text Transcription 3500x with Kaldi

Originally published at: https://developer.nvidia.com/blog/nvidia-accelerates-speech-text-transcription-3500x-kaldi/

Think of a sentence and repeat it aloud three times. If someone recorded this speech and performed a point-by-point comparison, they would find that no single utterance exactly matched the others. Similar to different resolutions, angles, and lighting conditions in imagery, human speech varies with respect to timing, pitch, amplitude, and even how base units…


Will you stream the speech? Is 1pm local time, right?


I had a relatively easy time running the docker setup on a Tesla P4 as is. Can post my benchmark results and hardware configuration if it's interesting.


That would be great. Please do!

How many concurrent interference processes can the Tesla v100 handle? And To my understanding kaldi only processed interference on cpu...

Hi @jwitsoe , thanks for the informative post, the performance benefits here are great :)

I was particularly interested in the part ... Future container releases will focus on developer productivity, including scripts to help users quickly run their own ASR models and native support for additional pre-trained ones.

Any updates on that? It would be great to read some docs about how to run an arbitrary pretrained Kaldi model here. Sorry if this exists already and I missed it.