Extracting Features from Multiple Audio Channels with Kaldi

Originally published at: https://developer.nvidia.com/blog/extracting-features-from-multiple-audio-channels-with-kaldi/

In automatic speech recognition (ASR), one widely used method combines traditional machine learning with deep learning. In ASR flows of this type, audio features are first extracted from the raw audio. Features are then passed into an acoustic model. The acoustic model is a neural net trained on transcribed data to extract phoneme probabilities from…