Hello community,
I am working on a project for Gujarati (Indic) Language ASR. I am using pre-trained English Quartznet 15*5 model. Because of small dataset (About 7 hours) (80-20 train-validation split), I freezed the encoder and unfreezed the decoder.
During training, the Validation loss gets stuck around 370-380 and training loss hovers around 250-300 – this despite running for almost 450+ epochs.
(I took 2 minutes same train and validation data and overfitted to verify that Model is outputting Gujarati – it gave very good results.)
Train Data contains 1732 files totalling 4.96 hours
Validation Data contains 433 files totalling 1.26 hours
The audio files are < 25 seconds and with sample rate of 22050 Hz.
Am attaching the hyperparameters that I tried till now, the loss graphs along with the config file.
Can someone suggest some good hyperparameters to try out? Or any better Augmentation techniques?
Augmentation:
target: nemo.collections.asr.modules.SpectrogramAugmentation
rect_freq: 50
rect_masks: 5
rect_time: 120
freq_masks: 2
freq_width: 25
time_masks: 10
time_width: 0.05
S.No | Wandb Run Name | Betas | Learning Rate | Weight Decay | Scheduler Warm up Ratio | Train batch size | No of epochs ran | Train Data Size | Validation Data Size | Best Training Loss | Terminating training loss | Best Validation Loss | Terminating Validation Loss |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | Run-ASR 2 | [0.5 0.6] | 0.000323 | 0.002 | 0 | 8 | 481 | 1732 files totalling 4.96 hours | 433 files totalling 1.26 hours | 223.579 (step - 1429) | 350.574 | 380.665 | 381.889 |
2 | Run-ASR 1 | [0.5 0.6] | 0.0012 | 0.001 | 0.1 | 16 | 304 | 1732 files totalling 4.96 hours | 433 files totalling 1.26 hours | 248.617 | 298.279 | 380.375 | 397.784 |
config_final11.yaml (8.7 KB)