→ This notebook shows a Language Model (LM) can be added on top of the ASR model for offline inference.
→ An external LM cannot be used for training Conformer-CTC model. However, we can train/use LM to increase accuracy. Please refer to “ASR Language Modeling” in NeMo user guide. Scripts are provided in NeMo for training LM and evaluating the pipeline with Beam Search decoding and n-gram LM.
→ Is the Neural rescorer based on the same tokenizer as the CTC-Conformer? No, there is no need to use the same tokenizer. We usually use yttm tokenizer with the Transformer rescorer whole ASR models use sentencepiece. As the vocal size of the CTC models is usually low in range of 128 to 256, then it is better to have larger vocab size of around 4k for the rescorer.
→ Conformer-RNNT has an integrated LM inside: link
→ If you do not want to use LM, we have released Conformer-Transducer checkpoints which may give better results: link
1 Like
Thank you for sharing this link. I think it will be very useful for Riva dev community.