Riva model deployment issue

We have fine-tuned an ASR model with “tao speech_to_text_conformer fine-tune”. Further, we want to deploy it using Riva, hence an export was done with “tao speech_to_text_conformer export”. To build this model further, we are trying to do riva-build and deploy.

When we are using riva-deploy, we are getting below error:

  1. when importing initializer: onnx::MatMul_6791
    [01/02/2023-14:32:31] [TRT] [E] parsers/onnx/ModelImporter.cpp:745: ERROR: parsers/onnx/ModelImporter.cpp:106 In function parseGraph:
    [8] Assertion failed: convertOnnxWeights(initializer, &weights, ctx) && “Failed to import initializer.”
    [01/02/2023-14:32:31] [TRT] [E] [network.cpp::addInput::1595] Error Code 3: API Usage Error (Parameter check failed at: optimizer/api/network.cpp::addInput::1595, condition: inName != knownInput->getName()
    )
    [01/02/2023-14:32:31] [TRT] [E] parsers/onnx/ModelImporter.cpp:745: ERROR: parsers/onnx/ModelImporter.cpp:106 In function parseGraph:
    [8] Assertion failed: convertOnnxWeights(initializer, &weights, ctx) && “Failed to import initializer.”
    [01/02/2023-14:32:31] [TRT] [E] parsers/onnx/ModelImporter.cpp:745: ERROR: audio_signal:269 In function importInput:
    [8] Assertion failed: *tensor && “Failed to add input to the network.”
    2023-01-02 14:32:31,252 [INFO] Mixed-precision net: 0 layers, 0 tensors, 0 outputs…
    2023-01-02 14:32:31,252 [ERROR] Traceback (most recent call last):
    File “/usr/local/lib/python3.8/dist-packages/servicemaker/cli/deploy.py”, line 100, in deploy_from_rmir
    generator.serialize_to_disk(
    File “/usr/local/lib/python3.8/dist-packages/servicemaker/triton/triton.py”, line 445, in serialize_to_disk
    module.serialize_to_disk(repo_dir, rmir, config_only, verbose, overwrite)
    File “/usr/local/lib/python3.8/dist-packages/servicemaker/triton/triton.py”, line 311, in serialize_to_disk
    self.update_binary(version_dir, rmir, verbose)
    File “/usr/local/lib/python3.8/dist-packages/servicemaker/triton/asr.py”, line 159, in update_binary
    RivaTRTConfigGenerator.update_binary_from_copied(self, version_dir, rmir, copied, verbose)
    File “/usr/local/lib/python3.8/dist-packages/servicemaker/triton/triton.py”, line 738, in update_binary_from_copied
    bindings = self.build_trt_engine_from_onnx(model_weights, engine_path=trt_file, verbose=verbose)
    File “/usr/local/lib/python3.8/dist-packages/servicemaker/triton/triton.py”, line 646, in build_trt_engine_from_onnx
    network = fix_fp16_network(network)
    File “/usr/local/lib/python3.8/dist-packages/servicemaker/triton/trt_bindings.py”, line 249, in fix_fp16_network
    sys.setrecursionlimit(len(network_definition))
    ValueError: recursion limit must be greater or equal than 1

The riva-build and deploy worked fine when we used the pre-trained Nemo model (stt_hi_conformer_ctc) but with our fine-tuned model, there is an error while Riva deploy.

Steps that we performed for model fine-tuning and deployment:

  1. tao speech_to_text_conformer create_tokenizer

  2. tao speech_to_text_conformer finetune

  3. tao speech_to_text_conformer export (export_format=RIVA)

  4. inside the Riva service maker

  5. riva-build speech_recognition

  6. riva-deploy

Kindly, provide some guidance for the issue.

Hi @iamgarimanarang

Thanks for your interest in Riva,

Apologies you are facing issue

Will it be possible to share the model with us via GoogleDrive/OneDrive etc (If Possible)

Also Please share the

  1. complete riva-build command used
  2. complete riva-deploy command used

Thanks

Hi @rvinobha ,

Thanks for the response. I’ve added the command in a text file. Please find below the link for the same:

https://drive.google.com/drive/folders/1jpjQX6PZM_4ScZPT-P6vrxZBOsJ_GeZ8?usp=sharing

Thanks and Regards,
Garima Narang

Hi,
I have the same issue with a finetuned model. did you manage to find a way to make it work?

Yoav

Hi @yoav.ellinson

We haven’t received any response to the above query. But, we were able to build the model with Nemo.
Please refer to this for the Nemo fine-tuning:
docker run --gpus=all -it --rm -v /project/path:/rift --shm-size=32g --net=host --ulimit memlock=-1 --ulimit stack=67108864 nvcr.io/nvidia/nemo:22.09 bash
python examples/asr//script_to_<script_name>.py
–config-path=
–config-name=<name of config without .yaml>)
model.train_ds.manifest_filepath=“”
model.validation_ds.manifest_filepath=“”
trainer.devices=-1
trainer.accelerator=‘gpu’
trainer.max_epochs=50
+init_from_nemo_model=“<path to .nemo model file>”

Reference links:

https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/asr/configs.html#fine-tuning-configurations

Thanks
Garima

thanks @iamgarimanarang , I’ll try it.

Hello @rvinobha ,
I realize it has been over a month since your last response, but is there any update on this issue? We have the exact same problem on our end. Any update or suggestion is much appreciated.
Thank you,
A. Johnson

Hello @rvinobha,
I encountered the same issue. I have search potential suggestions on more recent posts without success.
Any updates?

Thank you

Hi @rvinobha,
Exact same problem here. I have seen that it’s quite common that people having this issue during the phrase of deployment.

my commands for the building and deployment are shown as below:

riva-build speech_recognition
/servicemaker-dev/conformer_ctc_de_finetuned.rmir:tlt_encode
/servicemaker-dev/conformer_ctc_de_finetuned.riva:tlt_encode
–name=conformer-de-DE-asr-streaming
–return_separate_utterances=False
–featurizer.use_utterance_norm_params=False
–featurizer.precalc_norm_time_steps=0
–featurizer.precalc_norm_params=False
–ms_per_timestep=40
–endpointing.start_history=200
–nn.fp16_needs_obey_precision_pass
–endpointing.residue_blanks_at_start=-2
–chunk_size=0.16
–left_padding_size=1.92
–right_padding_size=1.92
–decoder_type=flashlight
–decoding_language_model_binary=/data/conformer-de-DE-asr-streaming-ctc-decoder-cpu-streaming/1/de-DE_default_2.0.bin
–decoding_vocab=/data/conformer-de-DE-asr-streaming-ctc-decoder-cpu-streaming/1/de-DE_default_2.0_dict_vocab.txt
–flashlight_decoder.lm_weight=0.7
–flashlight_decoder.word_insertion_score=0.75
–flashlight_decoder.beam_threshold=20.
–language_code=de-DE \

riva-deploy /servicemaker-dev/conformer_ctc_de_finetuned.rmir:tlt_encode /data/

Could you please take a look ? Any helps will be appreciated.