Failed to convert Nemo model to Riva using nemo2riva for ASR

Hardware - H100 or A100
Operating System - Ubuntu 22.04.4 LTS
nemo2riva-2.18.0
nemo_toolkit-2.0.0

After finetuning a conformer (stt_en_conformer_ctc_large ) model, I obtained a .nemo file that I fail to convert in .riva with nemo2riva.

I executed the following command :

nemo2riva --out {riva_file_path} {nemo_file_path}

I followed this tutorial : [ How to Deploy a Custom Acoustic Model (Conformer-CTC) Trained with NeMo on Riva]( How to Deploy a Custom Acoustic Model (Conformer-CTC) Trained with NeMo on Riva — NVIDIA Riva)

Here is the full output :

INFO: PyTorch version 2.5.1+cu121 available.
[NeMo I 2024-12-30 10:25:03 nemo2riva:38] Logging level set to 20
[NeMo I 2024-12-30 10:25:03 convert:36] Restoring NeMo model from ‘Conformer-CTC-BPE.nemo’
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs
Trainer(limit_train_batches=1.0) was configured so 100% of the batches per epoch will be used…
Trainer(limit_val_batches=1.0) was configured so 100% of the batches will be used…
Trainer(limit_test_batches=1.0) was configured so 100% of the batches will be used…
Trainer(limit_predict_batches=1.0) was configured so 100% of the batches will be used…
Trainer(val_check_interval=1.0) was configured so validation will run at the end of the training epoch…
[NeMo I 2024-12-30 10:25:03 mixins:173] Tokenizer SentencePieceTokenizer initialized with 128 tokens
[NeMo W 2024-12-30 10:25:04 modelPT:176] If you intend to do training or fine-tuning, please call the ModelPT.setup_training_data() method and provide a valid configuration file to setup the train data loader.
Train config :
manifest_filepath: /home/jovyan/haubt/ASR-Nemo-Riva/data_training/train.json
sample_rate: 16000
batch_size: 64
shuffle: true
num_workers: 8
pin_memory: true
max_duration: 16.7
min_duration: 0.1
is_tarred: false
tarred_audio_filepaths: null
shuffle_n: 2048
bucketing_strategy: synced_randomized
bucketing_batch_size: null

[NeMo W 2024-12-30 10:25:04 modelPT:183] If you intend to do validation, please call the ModelPT.setup_validation_data() or ModelPT.setup_multiple_validation_data() method and provide a valid configuration file to setup the validation data loader(s).
Validation config :
manifest_filepath: /home/jovyan/haubt/ASR-Nemo-Riva/data_training/validation.json
sample_rate: 16000
batch_size: 32
shuffle: false
use_start_end_token: false
num_workers: 8
pin_memory: true

[NeMo W 2024-12-30 10:25:04 modelPT:189] Please call the ModelPT.setup_test_data() or ModelPT.setup_multiple_test_data() method and provide a valid configuration file to setup the test data loader(s).
Test config :
manifest_filepath: null
sample_rate: 16000
batch_size: 16
shuffle: false
use_start_end_token: false
num_workers: 8
pin_memory: true

[NeMo I 2024-12-30 10:25:04 features:305] PADDING: 0
[NeMo W 2024-12-30 10:25:05 nemo_logging:349] /home/jovyan/anaconda3/envs/tts_nemo/lib/python3.10/site-packages/nemo/core/connectors/save_restore_connector.py:682: FutureWarning: You are using torch.load with weights_only=False (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See pytorch/SECURITY.md at main · pytorch/pytorch · GitHub for more details). In a future release, the default value for weights_only will be flipped to True. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via torch.serialization.add_safe_globals. We recommend you start setting weights_only=True for any use case where you don’t have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
return torch.load(model_weights, map_location=‘cpu’)

[NeMo I 2024-12-30 10:25:05 save_restore_connector:275] Model EncDecCTCModelBPE was successfully restored from /home/jovyan/haubt/ASR-Nemo-Riva/Conformer-CTC-BPE.nemo.
[NeMo I 2024-12-30 10:25:05 schema:163] Loaded schema file /home/jovyan/anaconda3/envs/tts_nemo/lib/python3.10/site-packages/nemo2riva/validation_schemas/asr-scr-exported-encdecclsmodel.yaml for nemo.collections.asr.models.classification_models.EncDecClassificationModel
[NeMo I 2024-12-30 10:25:05 schema:163] Loaded schema file /home/jovyan/anaconda3/envs/tts_nemo/lib/python3.10/site-packages/nemo2riva/validation_schemas/asr-stt-exported-encdecctcmodel.yaml for nemo.collections.asr.models.EncDecCTCModel
[NeMo I 2024-12-30 10:25:05 schema:163] Loaded schema file /home/jovyan/anaconda3/envs/tts_nemo/lib/python3.10/site-packages/nemo2riva/validation_schemas/asr-stt-exported-encdectcmodelbpe.yaml for nemo.collections.asr.models.EncDecCTCModelBPE
[NeMo I 2024-12-30 10:25:05 schema:163] Loaded schema file /home/jovyan/anaconda3/envs/tts_nemo/lib/python3.10/site-packages/nemo2riva/validation_schemas/nlp-isc-exported-bert.yaml for nemo.collections.nlp.models.IntentSlotClassificationModel
[NeMo I 2024-12-30 10:25:05 schema:163] Loaded schema file /home/jovyan/anaconda3/envs/tts_nemo/lib/python3.10/site-packages/nemo2riva/validation_schemas/nlp-mt-exported-encdecmtmodel.yaml for nemo.collections.nlp.models.MTEncDecModel
[NeMo I 2024-12-30 10:25:05 schema:163] Loaded schema file /home/jovyan/anaconda3/envs/tts_nemo/lib/python3.10/site-packages/nemo2riva/validation_schemas/nlp-mt-exported-megatronnmtmodel.yaml for nemo.collections.nlp.models.MegatronNMTModel
[NeMo I 2024-12-30 10:25:05 schema:163] Loaded schema file /home/jovyan/anaconda3/envs/tts_nemo/lib/python3.10/site-packages/nemo2riva/validation_schemas/nlp-pc-exported-bert.yaml for nemo.collections.nlp.models.PunctuationCapitalizationModel
[NeMo I 2024-12-30 10:25:05 schema:163] Loaded schema file /home/jovyan/anaconda3/envs/tts_nemo/lib/python3.10/site-packages/nemo2riva/validation_schemas/nlp-qa-exported-bert.yaml for nemo.collections.nlp.models.QAModel
[NeMo I 2024-12-30 10:25:05 schema:163] Loaded schema file /home/jovyan/anaconda3/envs/tts_nemo/lib/python3.10/site-packages/nemo2riva/validation_schemas/nlp-tc-exported-bert.yaml for nemo.collections.nlp.models.TextClassificationModel
[NeMo I 2024-12-30 10:25:05 schema:163] Loaded schema file /home/jovyan/anaconda3/envs/tts_nemo/lib/python3.10/site-packages/nemo2riva/validation_schemas/nlp-tkc-exported-bert.yaml for nemo.collections.nlp.models.TokenClassificationModel
[NeMo I 2024-12-30 10:25:05 schema:163] Loaded schema file /home/jovyan/anaconda3/envs/tts_nemo/lib/python3.10/site-packages/nemo2riva/validation_schemas/tts-exported-fastpitchmodel.yaml for nemo.collections.tts.models.FastPitchModel
[NeMo I 2024-12-30 10:25:05 schema:163] Loaded schema file /home/jovyan/anaconda3/envs/tts_nemo/lib/python3.10/site-packages/nemo2riva/validation_schemas/tts-exported-hifiganmodel.yaml for nemo.collections.tts.models.HifiGanModel
[NeMo I 2024-12-30 10:25:05 schema:163] Loaded schema file /home/jovyan/anaconda3/envs/tts_nemo/lib/python3.10/site-packages/nemo2riva/validation_schemas/tts-exported-radttsmodel.yaml for nemo.collections.tts.models.RadTTSModel
[NeMo I 2024-12-30 10:25:05 schema:202] Found validation schema for nemo.collections.asr.models.EncDecCTCModelBPE at /home/jovyan/anaconda3/envs/tts_nemo/lib/python3.10/site-packages/nemo2riva/validation_schemas/asr-stt-exported-encdectcmodelbpe.yaml
[NeMo I 2024-12-30 10:25:05 schema:231] Checking installed NeMo version … 2.0.0 OK (>=1.1)
[NeMo I 2024-12-30 10:25:05 artifacts:59] Found model at ./model_weights.ckpt
INFO: Checking Nemo version for ConformerEncoder …
[NeMo I 2024-12-30 10:25:05 schema:231] Checking installed NeMo version … 2.0.0 OK (>=1.7.0rc0)
[NeMo I 2024-12-30 10:25:05 artifacts:136] Retrieved artifacts: dict_keys([‘model_config.yaml’])
[NeMo W 2024-12-30 10:25:05 nemo_logging:349] /home/jovyan/anaconda3/envs/tts_nemo/lib/python3.10/site-packages/nemo2riva/cookbook.py:78: FutureWarning: torch.cuda.amp.autocast(args...) is deprecated. Please use torch.amp.autocast('cuda', args...) instead.
autocast = torch.cuda.amp.autocast(enabled=True, cache_enabled=False, dtype=torch.float16) if cfg.autocast else nullcontext()

[NeMo I 2024-12-30 10:25:05 cookbook:80] Exporting model EncDecCTCModelBPE with config=ExportConfig(export_subnet=None, export_format=‘ONNX’, export_file=‘model_graph.onnx’, encryption=None, autocast=True, max_dim=100000, export_args={})
[NeMo W 2024-12-30 10:25:05 nemo2riva:62] It looks like you’re trying to export a ASR model with max_dim=100000. Export is failing due to CUDA OOM. Reducing max_dim to 50000 and trying again…
[NeMo I 2024-12-30 10:25:05 convert:36] Restoring NeMo model from ‘Conformer-CTC-BPE.nemo’
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs
Trainer(limit_train_batches=1.0) was configured so 100% of the batches per epoch will be used…
Trainer(limit_val_batches=1.0) was configured so 100% of the batches will be used…
Trainer(limit_test_batches=1.0) was configured so 100% of the batches will be used…
Trainer(limit_predict_batches=1.0) was configured so 100% of the batches will be used…
Trainer(val_check_interval=1.0) was configured so validation will run at the end of the training epoch…
[NeMo I 2024-12-30 10:25:06 mixins:173] Tokenizer SentencePieceTokenizer initialized with 128 tokens
[NeMo W 2024-12-30 10:25:06 modelPT:176] If you intend to do training or fine-tuning, please call the ModelPT.setup_training_data() method and provide a valid configuration file to setup the train data loader.
Train config :
manifest_filepath: /home/jovyan/haubt/ASR-Nemo-Riva/data_training/train.json
sample_rate: 16000
batch_size: 64
shuffle: true
num_workers: 8
pin_memory: true
max_duration: 16.7
min_duration: 0.1
is_tarred: false
tarred_audio_filepaths: null
shuffle_n: 2048
bucketing_strategy: synced_randomized
bucketing_batch_size: null

[NeMo W 2024-12-30 10:25:06 modelPT:183] If you intend to do validation, please call the ModelPT.setup_validation_data() or ModelPT.setup_multiple_validation_data() method and provide a valid configuration file to setup the validation data loader(s).
Validation config :
manifest_filepath: /home/jovyan/haubt/ASR-Nemo-Riva/data_training/validation.json
sample_rate: 16000
batch_size: 32
shuffle: false
use_start_end_token: false
num_workers: 8
pin_memory: true

[NeMo W 2024-12-30 10:25:06 modelPT:189] Please call the ModelPT.setup_test_data() or ModelPT.setup_multiple_test_data() method and provide a valid configuration file to setup the test data loader(s).
Test config :
manifest_filepath: null
sample_rate: 16000
batch_size: 16
shuffle: false
use_start_end_token: false
num_workers: 8
pin_memory: true

[NeMo I 2024-12-30 10:25:06 features:305] PADDING: 0
[NeMo W 2024-12-30 10:25:06 nemo_logging:349] /home/jovyan/anaconda3/envs/tts_nemo/lib/python3.10/site-packages/nemo/core/connectors/save_restore_connector.py:682: FutureWarning: You are using torch.load with weights_only=False (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See pytorch/SECURITY.md at main · pytorch/pytorch · GitHub for more details). In a future release, the default value for weights_only will be flipped to True. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via torch.serialization.add_safe_globals. We recommend you start setting weights_only=True for any use case where you don’t have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
return torch.load(model_weights, map_location=‘cpu’)

[NeMo I 2024-12-30 10:25:07 save_restore_connector:275] Model EncDecCTCModelBPE was successfully restored from /home/jovyan/haubt/ASR-Nemo-Riva/Conformer-CTC-BPE.nemo.
[NeMo I 2024-12-30 10:25:07 schema:202] Found validation schema for nemo.collections.asr.models.EncDecCTCModelBPE at /home/jovyan/anaconda3/envs/tts_nemo/lib/python3.10/site-packages/nemo2riva/validation_schemas/asr-stt-exported-encdectcmodelbpe.yaml
[NeMo I 2024-12-30 10:25:07 schema:231] Checking installed NeMo version … 2.0.0 OK (>=1.1)
[NeMo I 2024-12-30 10:25:07 artifacts:59] Found model at ./model_weights.ckpt
INFO: Checking Nemo version for ConformerEncoder …
[NeMo I 2024-12-30 10:25:07 schema:231] Checking installed NeMo version … 2.0.0 OK (>=1.7.0rc0)
[NeMo I 2024-12-30 10:25:07 artifacts:136] Retrieved artifacts: dict_keys([‘model_config.yaml’])
[NeMo W 2024-12-30 10:25:07 nemo_logging:349] /home/jovyan/anaconda3/envs/tts_nemo/lib/python3.10/site-packages/nemo2riva/cookbook.py:78: FutureWarning: torch.cuda.amp.autocast(args...) is deprecated. Please use torch.amp.autocast('cuda', args...) instead.
autocast = torch.cuda.amp.autocast(enabled=True, cache_enabled=False, dtype=torch.float16) if cfg.autocast else nullcontext()

[NeMo I 2024-12-30 10:25:07 cookbook:80] Exporting model EncDecCTCModelBPE with config=ExportConfig(export_subnet=None, export_format=‘ONNX’, export_file=‘model_graph.onnx’, encryption=None, autocast=True, max_dim=50000, export_args={})
[NeMo E 2024-12-30 10:25:07 cookbook:131] ERROR: Export failed. Please make sure your NeMo model class (<class ‘nemo.collections.asr.models.ctc_bpe_models.EncDecCTCModelBPE’>) has working export() and that you have the latest NeMo package installed with [all] dependencies.
Traceback (most recent call last):
File “/home/jovyan/anaconda3/envs/tts_nemo/bin/nemo2riva”, line 8, in
sys.exit(nemo2riva())
File “/home/jovyan/anaconda3/envs/tts_nemo/lib/python3.10/site-packages/nemo2riva/cli/nemo2riva.py”, line 49, in nemo2riva
Nemo2Riva(args)
File “/home/jovyan/anaconda3/envs/tts_nemo/lib/python3.10/site-packages/nemo2riva/convert.py”, line 87, in Nemo2Riva
export_model(
File “/home/jovyan/anaconda3/envs/tts_nemo/lib/python3.10/site-packages/nemo2riva/cookbook.py”, line 132, in export_model
raise e
File “/home/jovyan/anaconda3/envs/tts_nemo/lib/python3.10/site-packages/nemo2riva/cookbook.py”, line 90, in export_model
_, descriptions = model.export(
File “/home/jovyan/anaconda3/envs/tts_nemo/lib/python3.10/site-packages/nemo/core/classes/exportable.py”, line 117, in export
out, descr, out_example = model._export(
File “/home/jovyan/anaconda3/envs/tts_nemo/lib/python3.10/site-packages/nemo/core/classes/exportable.py”, line 197, in _export
output_example = self.forward(*input_list, **input_dict)
File “/home/jovyan/anaconda3/envs/tts_nemo/lib/python3.10/site-packages/nemo/collections/asr/models/asr_model.py”, line 288, in forward_for_export
encoder_output = enc_fun(audio_signal=audio_signal, length=length)
File “/home/jovyan/anaconda3/envs/tts_nemo/lib/python3.10/site-packages/nemo/collections/asr/modules/conformer_encoder.py”, line 461, in forward_for_export
rets = self.forward_internal(
File “/home/jovyan/anaconda3/envs/tts_nemo/lib/python3.10/site-packages/nemo/collections/asr/modules/conformer_encoder.py”, line 583, in forward_internal
audio_signal = layer(
File “/home/jovyan/anaconda3/envs/tts_nemo/lib/python3.10/site-packages/torch/nn/modules/module.py”, line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File “/home/jovyan/anaconda3/envs/tts_nemo/lib/python3.10/site-packages/torch/nn/modules/module.py”, line 1747, in _call_impl
return forward_call(*args, **kwargs)
File “/home/jovyan/anaconda3/envs/tts_nemo/lib/python3.10/site-packages/nemo/collections/asr/parts/submodules/conformer_modules.py”, line 171, in forward
x = self.self_attn(query=x, key=x, value=x, mask=att_mask, pos_emb=pos_emb, cache=cache_last_channel)
File “/home/jovyan/anaconda3/envs/tts_nemo/lib/python3.10/site-packages/torch/nn/modules/module.py”, line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File “/home/jovyan/anaconda3/envs/tts_nemo/lib/python3.10/site-packages/torch/nn/modules/module.py”, line 1747, in _call_impl
return forward_call(*args, **kwargs)
File “/home/jovyan/anaconda3/envs/tts_nemo/lib/python3.10/site-packages/nemo/collections/asr/parts/submodules/multi_head_attention.py”, line 252, in forward
scores = (matrix_ac + matrix_bd) / self.s_d_k # (batch, head, time1, time2)
RuntimeError: The size of tensor a (12500) must match the size of tensor b (7500) at non-singleton dimension 3

Please use nemo_toolkit==1.23.0