Riva ASR acoustic model finetune

Hi,
I am building an ASR application with Riva ASR in French.
Due to noisy environment (acoustic environment), the stock french Conformer model (RIVA Conformer ASR French | NVIDIA NGC ) is not sufficient enough so I would like to finetune it with augmented data from the same acoustic environment.
While following the finetune jupyter notebook (tutorials/asr-python-advanced-finetune-am-citrinet-tao-finetuning.ipynb at stable · nvidia-riva/tutorials · GitHub ), it is required to create a tokenizer for training,
What are the recommended configurations for the tokenizer (bpe/spe/wpe)? and what is the correct vocab size?
While talking to the technical team at NVIDIA they suggested to finetune with the vocab file that is available on ngc (Riva ASR French LM | NVIDIA NGC dict_vocab_2.1.txt) . How can that be done? what tokenizer should be used while doing this?

Thanks,
Yoav

Hardware - GPU A6000 x2
Hardware - CPU Intel Xeon Silver 4216 2.1GHz x2
Operating System Ubuntu 20.04
Riva Version 2.40
How to reproduce the issue ? (This is for errors. Please share the command and the detailed log here)

Hi @yoav.ellinson

Thanks for your interest in Riva,

I will check regarding your questions on finetuning with the team and get back soon

Thanks

1 Like