Develop Smaller Speech Recognition Models with NVIDIA’s NeMo Framework

Originally published at: https://developer.nvidia.com/blog/develop-smaller-speech-recognition-models-with-nvidias-nemo-framework/

As computers and other personal devices have become increasingly prevalent, interest in conversational AI has grown due to its multitude of potential applications in a variety of situations. Each conversational AI framework is comprised of several more basic modules such as automatic speech recognition (ASR), and the models for these need to be lightweight in…

For all the talk about the edge, the article fails to describe inference times or edge hardware requirements. Does it run on jetson nano? Rasberri pi? How much ram? Etc

Yes, QuartzNet inference in NeMo does run on Jetson Nano. We never tried Rasberri pi though. Note that QuartzNet is an architecture - e.g. QuartzNet15x5 has B=15 blocks with R=5 sub-blocks within each block. See https://nvidia.github.io/Ne... . To lessen memory footprint you can chose to have less blocks and/or subblocks, but then you will have to re-train yourself. Another (very effective) way to reduce memory footprint is to give it audio in shorter segments.

Can I make an inference using NeMo on wav2letter? Does the library have methods to do it?

We don't have wav2letter model in NeMo, but Jasper model is similar to it

Sorry I did not explain myself correctly. I was referring to if the NeMo library has methods to make inference with a model?

yes. we also provide high-quality pre-trained checkpoints for QuartzNet

Take a look at https://github.com/NVIDIA/N...