Fine Tune the hind Nvidia Nemo

iamgarimanarang · January 11, 2023, 7:05am

Trying to finetune the pretrained nemo model with below command
python speech_to_text_ctc_bpe.py --config-path=“/path/to/confiig/file” --config-name=“conformer_ctc_bpe_v13” trainer.max_epochs=50 +init_from_nemo_model=“path/to/stt_hi_conformer_ctc_medium.nemo”

Getting error as shown below:
from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for encoder.layers.17.norm_self_att.weight: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for encoder.layers.17.norm_self_att.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for encoder.layers.17.self_attn.pos_bias_u: copying a param with shape torch.Size([4, 64]) from checkpoint, the shape in current model is torch.Size([8, 64]).
size mismatch for encoder.layers.17.self_attn.pos_bias_v: copying a param with shape torch.Size([4, 64]) from checkpoint, the shape in current model is torch.Size([8, 64]).
size mismatch for encoder.layers.17.self_attn.linear_q.weight: copying a param with shape torch.Size([256, 256]) from checkpoint, the shape in current model is torch.Size([512, 512]).
size mismatch for encoder.layers.17.self_attn.linear_q.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for encoder.layers.17.self_attn.linear_k.weight: copying a param with shape torch.Size([256, 256]) from checkpoint, the shape in current model is torch.Size([512, 512]).
size mismatch for encoder.layers.17.self_attn.linear_k.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for encoder.layers.17.self_attn.linear_v.weight: copying a param with shape torch.Size([256, 256]) from checkpoint, the shape in current model is torch.Size([512, 512]).
size mismatch for encoder.layers.17.self_attn.linear_v.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for encoder.layers.17.self_attn.linear_out.weight: copying a param with shape torch.Size([256, 256]) from checkpoint, the shape in current model is torch.Size([512, 512]).
size mismatch for encoder.layers.17.self_attn.linear_out.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for encoder.layers.17.self_attn.linear_pos.weight: copying a param with shape torch.Size([256, 256]) from checkpoint, the shape in current model is torch.Size([512, 512]).
size mismatch for encoder.layers.17.norm_feed_forward2.weight: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for encoder.layers.17.norm_feed_forward2.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for encoder.layers.17.feed_forward2.linear1.weight: copying a param with shape torch.Size([1024, 256]) from checkpoint, the shape in current model is torch.Size([2048, 512]).
size mismatch for encoder.layers.17.feed_forward2.linear1.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([2048]).
size mismatch for encoder.layers.17.feed_forward2.linear2.weight: copying a param with shape torch.Size([256, 1024]) from checkpoint, the shape in current model is torch.Size([512, 2048]).
size mismatch for encoder.layers.17.feed_forward2.linear2.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for encoder.layers.17.norm_out.weight: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for encoder.layers.17.norm_out.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for decoder.decoder_layers.0.weight: copying a param with shape torch.Size([129, 256, 1]) from checkpoint, the shape in current model is torch.Size([129, 512, 1]).

Can you please help out with the error?

rvinobha · January 11, 2023, 12:50pm

Hi @iamgarimanarang

Thanks for your interest in Riva

Since it is related to Nemo, Please file the issue at below Github link

Thanks

iamgarimanarang · January 12, 2023, 4:32am

Hi @rvinobha,

Thanks for the response. I’ve created a new Git issue for the same.

rvinobha · January 12, 2023, 9:43am

Hi @iamgarimanarang

I have inputs regarding this issue

The Issue seems to happen because you are using tokenizer vocabulary size of 512 whereas our pretrained medium model uses 128.

We suggest to use 128 vocab size only.

Please try and let us know

Thanks

iamgarimanarang · January 12, 2023, 10:50am

Okay @rvinobha
Will try it

iamgarimanarang · January 12, 2023, 11:05am

Hi @rvinobha
I’ve used the tokenizer_spe_unigram_v128.
This is how I created the tokenizer. I did not use 512 as vocab. size.
tao speech_to_text_conformer create_tokenizer -e /specs/speech_to_text_conformer/create_tokenizer.yaml -r /results/conformer/create_tokenizer manifests=/data/train_manifest.json output_root=/data/con_asr vocab_size=128.
Can you please let me know if this is correct and let me know if there is some other way to create the vocab?

rvinobha · January 12, 2023, 11:30am

Hi @iamgarimanarang

Can you please share the create_tokenizer.yaml used
in path /specs/speech_to_text_conformer/create_tokenizer.yaml
with us

Also we request the following changes in Nemo

i.e project under which speech_to_text_ctc_bpe.py scripts exists

github.com

NVIDIA/NeMo/blob/main/examples/asr/conf/conformer/conformer_ctc_bpe.yaml#L106


      
            # you may use lower time_masks for smaller models to have a faster convergence
            time_masks: 10 # set to zero to disable it
            freq_width: 27
            time_width: 0.05
          
          
encoder:
            _target_: nemo.collections.asr.modules.ConformerEncoder
            feat_in: ${model.preprocessor.features}
            feat_out: -1 # you may set it if you need different output size other than the default d_model
            n_layers: 18
            d_model: 512
          
          
  # Sub-sampling params
            subsampling: striding # vggnet, striding, stacking or stacking_norm, dw_striding
            subsampling_factor: 4 # must be power of 2 for striding and vggnet
            subsampling_conv_channels: -1 # -1 sets it to d_model
            causal_downsampling: false
          
          
  # Feed forward module's params
            ff_expansion_factor: 4

in conformer_ctc_bpe.yaml file,
Change Line no 106 from
d_model: 512
to
d_model: 256

github.com

NVIDIA/NeMo/blob/main/examples/asr/conf/conformer/conformer_ctc_bpe.yaml#L119


      
          subsampling: striding # vggnet, striding, stacking or stacking_norm, dw_striding
          subsampling_factor: 4 # must be power of 2 for striding and vggnet
          subsampling_conv_channels: -1 # -1 sets it to d_model
          causal_downsampling: false
          
          
# Feed forward module's params
          ff_expansion_factor: 4
          
          
# Multi-headed Attention Module's params
          self_attention_model: rel_pos # rel_pos or abs_pos
          n_heads: 8 # may need to be lower for smaller d_models
          # [left, right] specifies the number of steps to be seen from left and right of each step in self-attention
          att_context_size: [-1, -1] # -1 means unlimited context
          att_context_style: regular # regular or chunked_limited
          xscaling: true # scales up the input embeddings by sqrt(d_model)
          untie_biases: true # unties the biases of the TransformerXL layers
          pos_emb_max_len: 5000
          
          
# Convolution module's params
          conv_kernel_size: 31
          conv_norm_type: 'batch_norm' # batch_norm or layer_norm or groupnormN (N specifies the number of groups)

in same conformer_ctc_bpe.yaml file,
Change Line no 119 from
n_heads: 8
to
n_heads: 4

and try again and let us know if it helps

Thanks

iamgarimanarang · January 12, 2023, 12:26pm

Please find attached the yaml file. Also, I’m updating the changes in the conformer yaml file.
Uploading: create_tokenizer.yaml…

iamgarimanarang · January 12, 2023, 12:33pm

Training is happening after updating the above parameters.

Thanks, @rvinobha

iamgarimanarang · January 16, 2023, 4:50am

Hi @rvinobha
Can you help me with the parameter to enable in order to save the best model?

Thanks,
Garima Narang

iamgarimanarang · January 16, 2023, 12:03pm

Hi @rvinobha
I’m trying to convert the model using nemo2riva but getting segmentation fault error.
Can you please check this?
nemo2riva --out Conformer-CTC-BPE.riva Conformer-CTC-BPE/2023-01-16_07-30-05/checkpoints/Conformer-CTC-BPE.nemo

Thanks
Garima Narang

rvinobha · January 17, 2023, 6:46am

Hi @iamgarimanarang

can help me with the parameter to enable in order to save the best model?

I will get back to the developers on this request and provide inputs

I’m trying to convert the model using nemo2riva but getting segmentation fault error.
Can you please check this?

Apologies you are facing issue
Will it be possible to share the model with us via GoogleDrive/OneDrive, that will be the best to test the scenario and reproduce from our end, (if yes please let me know i will send an an email in private)

Also request to share the nemo version used
please run pip list and share the complete output with us

Thanks

iamgarimanarang · January 17, 2023, 7:55am

Hi @rvinobha

Thanks @rvinobha. I am able to do the conversion. It was a version problem.

I’ve done riva-build and riva-deploy and now I want to run the riva_client. Can you send me the sample config.sh for riva_quickstart_v2.8.1 I’m getting errors while running the client.
E0117 07:52:06.480595 207 grpc_riva_asr.cc:885] Error: Unavailable model requested given these parameters: language_code=hi-IN; type=offline;

Can you please help me out with the issue?

Thanks
Garima Narang

iamgarimanarang · January 17, 2023, 7:58am

Hi @rvinobha,

I used the parameter +save_best_model=True while fine-tuning the model. Please confirm and let me know if this is how we get the best model. Also, verify if the best model gets saved in checkpoints/Conformer-CTC-BPE.nemo.

Thanks
Garima Narang

iamgarimanarang · January 18, 2023, 4:42am

Hi @rvinobha
Yes, I can share the model in private. Let me know how to share.
Thanks,
Garima

rvinobha · January 18, 2023, 8:46am

Hi @iamgarimanarang

I got some inputs from developers choosing best model

By default the yaml should save top 3 or top 5 based on lowest wer. this is default behaviour,

However, you can leverage checkpoint averaging across them and 95% of the time u will get a model which is more accurate than top1

github.com

NVIDIA/NeMo/blob/main/scripts/checkpoint_averaging/checkpoint_averaging.py

#!/usr/bin/env python3
# Copyright (c) 2021, NVIDIA CORPORATION.  All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""
Builds a .nemo file with average weights over multiple .ckpt files (assumes .ckpt files in same folder as .nemo file).

Usage example for building *-averaged.nemo for a given .nemo file:

NeMo/scripts/checkpoint_averaging/checkpoint_averaging.py my_model.nemo

This file has been truncated. show original

For the model sharing, i have sent the drive link to your email (the gmail you have created account with forum), Please upload the model, let me know once done

iamgarimanarang · January 18, 2023, 11:08am

Hi @rvinobha
I’ve uploaded the model on the link. Kindly check.

Thanks,
Garima Narang

rvinobha · January 19, 2023, 9:17am

Hi @iamgarimanarang

Thanks for sharing the nemo model
Will it be possible to also share the nemo2riva converted .riva model in the same drive

Thanks

iamgarimanarang · January 19, 2023, 9:56am

Hi @rvinobha

Uploaded the converted .riva model in the drive.

Thanks

rvinobha · January 20, 2023, 7:30pm

Hi @iamgarimanarang

I was able to get the riva-build and riva-deploy done,
But facing error on riva_start.sh

I am checking with the developers, once i fix and get it running, will share you the notebook

Thanks