Finetune AM from pretained riva/tlt model

Please provide the following information when requesting support.

Hardware - GPU T4
Hardware - CPU Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz
Operating System Ubuntu 20.04.3 LTS
Riva Version v2.2.1
TLT Version (if relevant)
How to reproduce the issue ? (This is for errors. Please share the command and the detailed log here)

command:

tao speech_to_text_conformer train -e /specs -g 1 training_ds.manifest_filepath=/data/train_clean_5.json validation_ds.manifest_filepath=/data/dev_clean_2.json trainer.max_epochs=2  -r /results --config-path /specs/ --config-name train_conformer_bpe_small.yaml -m data/speechtotext_es_us_conformer_vtrainable_v2.1/speechtotext_es_us_conformer.tlt

attaching the config file.

train_conformer_bpe_small.yaml (5.1 KB)

The model is getting trained from scratch/last saved checkpoint. it doesn’t take the .tlt

Thanks,
Supreet

Hi @supreet.preet

Thanks for your interest in Riva

Thanks for sharing the command, the command you have posted will be used for training (initial face)

To finetune from a previous checkpoint
Please run the below command

tao speech_to_text_conformer finetune -e <experiment_spec> \
                             -m <model_checkpoint> \
                             -g <num_gpus>
  • -m: The path to the model checkpoint from which to fine-tune. The model checkpoint should be a .tlt file.
  • -e: The experiment specification file to set up fine-tuning. This requires the trainer, save_to, and optim configurations described in the “Training Process Configs” section, as well as finetuning_ds and validation_ds configs, as described in the “Dataset Configs” section. Additionally, if your fine-tuning dataset has a different vocabulary (i.e. set of labels) than the trained model checkpoint, you must also set change_vocabulary: true at the top level of your specification file.

Please find the link below for complete reference
https://docs.nvidia.com/tao/tao-toolkit/text/asr/speech_recognition_with_conformer.html?highlight=speech_to_text_conformer#fine-tuning-the-model

Thanks

tao_fine_tuning_permisson_error.txt (12.2 KB)
Hi @rvinobha
Getting some permission error.

I am attaching logs for ref.

Hi @supreet.preet

Thanks for sharing the logs

We guess while you did the initial training (before the fine-tune), you might have used an encryption key at that time, So once it is used, while finetune or export you may need to pass that key again

So, we request to please reprovide the encryption key while running the finetune

tao speech_to_text_conformer finetune -e <experiment_spec> \
                            -m <model_checkpoint> \
                            -k <encryption_key> \
                            -g <num_gpus>
                           

Thanks

I was using the AM model provided by RIVA, will it same as my “RIVA_API_KEY”?

Hi @supreet.preet

Can you please provide me the link of the model you are referring to

Thanks

curl -LO ‘https://api.ngc.nvidia.com/v2/models/nvidia/tao/speechtotext_en_us_conformer/versions/trainable_v3.0/files/speechtotext_en_us_conformer.tlt

form ngc catlog conformer

Hi @supreet.preet

Please find the encryption key

export KEY='tlt_encode'
tao speech_to_text_conformer finetune -e <experiment_spec> \
                            -m <model_checkpoint> \
                            -k $KEY \
                            -g <num_gpus>

Let me know if it works

Thanks

Hi @rvinobha,
Thanks this worked but getting another error.

tao_finetuning_error_17-04-22.txt (12.3 KB)

Apart from this can you let me know a few details:

  1. which is the default config it uses.
  2. how to add custom manifests for training and validation.
  3. augmentation pipeline. (can I have my own noise dataset added?)

Hi @rvinobha

Pls find the command use for it.

tao speech_to_text_conformer finetune -e /specs -g 1 -m /data/speechtotext_en_us_conformer_vtrainable_v4.0/speechtotext_en_us_conformer.tlt -r /results -k $KEY +training_ds.manifest_filepath=/data/train_clean_5.json validation_ds.manifest_filepath=/data/dev_clean_2.json