Nvidia NEMO / ASR model to be trained on VCTK data

I have run NEMO ASR tutorial on a local Ubuntu PC with GPU: the model trains correctly on the provided data. The important thing to understand is how to use this notebook for the VCTK data set I have. There are wav files and text files with identical descriptors. The text file contains the sentence spoken in the wav file.

joepareti54@MSI /cygdrive/f/x/finance-2020/AI/Listen_attend_spell/VCTK-Corpus
$  ls -l txt/p225 | head -5
total 231
-rw-r--r--+ 1 joepareti54 None  20 Aug 22  2012 p225_001.txt
-rw-r--r--+ 1 joepareti54 None  55 Aug 22  2012 p225_002.txt
-rw-r--r--+ 1 joepareti54 None 103 Aug 22  2012 p225_003.txt
-rw-r--r--+ 1 joepareti54 None  68 Aug 22  2012 p225_004.txt

joepareti54@MSI /cygdrive/f/x/finance-2020/AI/Listen_attend_spell/VCTK-Corpus
$ ls -l wav48/p225 | head -5
total 93464
-rw-r--r--+ 1 joepareti54 None  196990 Aug 23  2012 p225_001.wav
-rw-r--r--+ 1 joepareti54 None  389676 Aug 23  2012 p225_002.wav
-rw-r--r--+ 1 joepareti54 None  749754 Aug 23  2012 p225_003.wav
-rw-r--r--+ 1 joepareti54 None  423528 Aug 23  2012 p225_004.wav

The feasibility of using NEMO for my project depends on the manifest and yaml files, and perhaps on other things. Any guidance is appreciated.