I have run NEMO ASR tutorial on a local Ubuntu PC with GPU: the model trains correctly on the provided data. The important thing to understand is how to use this notebook for the VCTK data set I have. There are wav files and text files with identical descriptors. The text file contains the sentence spoken in the wav file.
joepareti54@MSI /cygdrive/f/x/finance-2020/AI/Listen_attend_spell/VCTK-Corpus
$ ls -l txt/p225 | head -5
total 231
-rw-r--r--+ 1 joepareti54 None 20 Aug 22 2012 p225_001.txt
-rw-r--r--+ 1 joepareti54 None 55 Aug 22 2012 p225_002.txt
-rw-r--r--+ 1 joepareti54 None 103 Aug 22 2012 p225_003.txt
-rw-r--r--+ 1 joepareti54 None 68 Aug 22 2012 p225_004.txt
joepareti54@MSI /cygdrive/f/x/finance-2020/AI/Listen_attend_spell/VCTK-Corpus
$ ls -l wav48/p225 | head -5
total 93464
-rw-r--r--+ 1 joepareti54 None 196990 Aug 23 2012 p225_001.wav
-rw-r--r--+ 1 joepareti54 None 389676 Aug 23 2012 p225_002.wav
-rw-r--r--+ 1 joepareti54 None 749754 Aug 23 2012 p225_003.wav
-rw-r--r--+ 1 joepareti54 None 423528 Aug 23 2012 p225_004.wav
The feasibility of using NEMO for my project depends on the manifest and yaml files, and perhaps on other things. Any guidance is appreciated.