Finetuning Nemo Model

Hello community,

I am working on a project for Gujarati (Indic) Language ASR. I am using pre-trained English Quartznet 15*5 model. Because of small dataset (About 7 hours) (80-20 train-validation split), I freezed the encoder and unfreezed the decoder.
During training, the Validation loss gets stuck around 370-380 and training loss hovers around 250-300 – this despite running for almost 450+ epochs.

(I took 2 minutes same train and validation data and overfitted to verify that Model is outputting Gujarati – it gave very good results.)

Train Data contains 1732 files totalling 4.96 hours
Validation Data contains 433 files totalling 1.26 hours
The audio files are < 25 seconds and with sample rate of 22050 Hz.

Am attaching the hyperparameters that I tried till now, the loss graphs along with the config file.
Can someone suggest some good hyperparameters to try out? Or any better Augmentation techniques?

Augmentation:
target: nemo.collections.asr.modules.SpectrogramAugmentation
rect_freq: 50
rect_masks: 5
rect_time: 120
freq_masks: 2
freq_width: 25
time_masks: 10
time_width: 0.05

S.No Wandb Run Name Betas Learning Rate Weight Decay Scheduler Warm up Ratio Train batch size No of epochs ran Train Data Size Validation Data Size Best Training Loss Terminating training loss Best Validation Loss Terminating Validation Loss
1 Run-ASR 2 [0.5 0.6] 0.000323 0.002 0 8 481 1732 files totalling 4.96 hours 433 files totalling 1.26 hours 223.579 (step - 1429) 350.574 380.665 381.889
2 Run-ASR 1 [0.5 0.6] 0.0012 0.001 0.1 16 304 1732 files totalling 4.96 hours 433 files totalling 1.26 hours 248.617 298.279 380.375 397.784

config_final11.yaml (8.7 KB)

Hello Community,
Any updates or suggestions for this?

Hi @Devansh_Shah

I suggest posting this in the NeMo Github discussion area. NVIDIA/NeMo · Discussions · GitHub

Cheers,
Tom