Please provide the following information when requesting support.
Hardware - GPU (A100/A30/T4/V100) : A10G
Hardware - CPU : AMD EPYC 7R32
Operating System : Ubuntu 22.04
Riva Version : 2.18.0
I’m using a g5.4xlarge based ec2 instance. I am finetunning the nemo NMT models megatron any-to-en and en-to-any. After finetunning i want to load tis model in riva quickstart 2.18.0. But i am facing problem with nemo2riva models conversion.
I am using nemo24.01.framwork container for finetunning . I installed nemo2riva(both 2.18 and 2.19) and tried converting the nemo model into riva but it gives following error althougth it(2.18) worked with billingual model but in megatron’s conversion it gives the following error :
traceback (most recent call last):
File "/usr/local/bin/nemo2riva", line 8, in <module>
sys.exit(nemo2riva())
File "/usr/local/lib/python3.10/dist-packages/nemo2riva/cli/nemo2riva.py", line 49, in nemo2riva
Nemo2Riva(args)
File "/usr/local/lib/python3.10/dist-packages/nemo2riva/convert.py", line 87, in Nemo2Riva
export_model(
File "/usr/local/lib/python3.10/dist-packages/nemo2riva/cookbook.py", line 132, in export_model
raise e
File "/usr/local/lib/python3.10/dist-packages/nemo2riva/cookbook.py", line 90, in export_model
_, descriptions = model.export(
File "/opt/NeMo/nemo/core/classes/exportable.py", line 114, in export
out, descr, out_example = model._export(
File "/opt/NeMo/nemo/core/classes/exportable.py", line 187, in _export
self._prepare_for_export(output=output, input_example=input_example, **my_args)
File "/opt/NeMo/nemo/core/classes/exportable.py", line 267, in _prepare_for_export
replace_for_export(self)
File "/opt/NeMo/nemo/utils/export_utils.py", line 457, in replace_for_export
replace_modules(model, default_Apex_replacements)
File "/opt/NeMo/nemo/utils/export_utils.py", line 426, in replace_modules
swapped = expansions[m_type](m)
File "/opt/NeMo/nemo/utils/export_utils.py", line 300, in replace_ParallelLinear
mod.load_state_dict(n_state)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 2152, in load_state_dict
raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for LinearWithBiasSkip:
Unexpected key(s) in state_dict: "_extra_state".
root@02f633c1c4a6:/workspace#
so I have tried every version of nemo2riva in the nemo24.0.1.framework container but ti gives the same error. then i changed the container to nemo22.11 and nemo2riva(2.18, 2.19) gives the same error. So I installed nemo2riva 2.14.0 and ran the following command and it convertrf then nemo model into riva.
nemo2riva --key tlt_encode --max-dim 1024 --out /workspace/megatron.riva /workspace/megatron/megatronnmt_any
_en_500m.nemo
then Iconverted it into rmir usning both riva-speech2.14.servicemaker and riva-speech 2.18 using the following command :
riva-build megatron_translation \
--name nmt_multi_model \
megatronnmt_custom_any_en_500m.rmir \
megatronnmt_custom_any_en_500m.riva
I tried both of these models in riva quickstart. first i ran riva_init.sh and it converted them into model directories. but in riva_start.sh it does not loads them on riva server.
These were all the steps that were given by nvidia tutorials but they don’t work.If anyone can help please let me know.