NeMo Text Processing: Addition of English Math ITN Support

Hardware - A100
Hardware - AMD EPYC 7742 64-Core Processor
Operating System - Linux
Riva Version - 2.12.1

I hope this message finds you well. I am reaching out to discuss a potential enhancement to the NeMo text processing repository.

Recently, while working with the English ITN (Inverse Text Normalization) in NeMo, we encountered an issue related to the inclusion of Math operations. Currently, Math operations are not included in the English ITN by default. To address this, we developed a tagger and verbalizer specifically for Math operations. We tested this functionality using the newly created graphfst on a local ITN normalizer, and it worked perfectly fine.

However, when we deployed the same functionality via far files to the Riva build, we encountered some difficulties. Upon further investigation, we discovered that the Sparrowhawk Library has a limited number of semiotic classes, and the addition of new classes is not allowed. (https://github.com/google/sparrowhawk/blob/master/src/proto/semiotic_classes.proto)

Given this situation, we would like to discuss the possibility of adding English Math ITN support to the NeMo text processing repository. This would involve expanding the existing semiotic classes in the Sparrowhawk Library to accommodate Math operations.

We would greatly appreciate your thoughts and insights on this matter. If there are any existing plans or ongoing discussions regarding the addition of English Math ITN, we would be eager to learn more and contribute to the development process.

Thank you for your attention to this request. We look forward to hearing from you soon.

Below the image is the screenshot of the above discussion, which shows the newly created math fst, is working fine on the local system, whereas when the ITN far file is used with RIVA build, it’s not working.

@akashdeshwani a very interesting project indeed. Have you considered opening a PR, an issue or discussion on GitHub - NVIDIA/NeMo-text-processing: NeMo text processing for ASR and TTS; the development of NeMo TN/ITN moved out of the main NeMo repository, some time ago.

@ilb, thank you for suggestion i have raised PR, here is the link for that. NeMo Text Processing: Addition of English Math ITN Support · NVIDIA/NeMo-text-processing · Discussion #114 · GitHub