RIVA v2.15.0 fails to build NeMo model

Please provide the following information when requesting support.

Hardware - GPU (A100/A30/T4/V100) A40/ RTX A6000
Hardware - CPU
Operating System: docker image
Riva Version v2.15.0
TLT Version (if relevant)
How to reproduce the issue ?


since upgrading to v2.15.0, I have not been able to build NeMo models with RIVA.
Not even the quick start examples build. I’ve tried with conformer, conformer-xl, and parakeet.

Below the logs from one of the bash riva_init.sh runs:

msis@new-6180:~/riva_quickstart_v2.15.0$ bash riva_init.sh 
Logging into NGC docker registry if necessary...
Pulling required docker images if necessary...
Note: This may take some time, depending on the speed of your Internet connection.
> Pulling Riva Speech Server images.
  > Pulling nvcr.io/nvidia/riva/riva-speech:2.15.0. This may take some time...
  > Pulling nvcr.io/nvidia/riva/riva-speech:2.15.0-servicemaker. This may take some time...

Downloading models (RMIRs) from NGC...
Note: this may take some time, depending on the speed of your Internet connection.
To skip this process and use existing RMIRs set the location and corresponding flag in config.sh.

==========================
=== Riva Speech Skills ===
==========================

NVIDIA Release  (build 86328935)
Copyright (c) 2016-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.

Copyright (c) 2018-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.

https://developer.nvidia.com/tensorrt

Various files include modifications (c) NVIDIA CORPORATION & AFFILIATES.  All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

To install Python sample dependencies, run /opt/tensorrt/python/python_setup.sh

To install the open-source samples corresponding to this TensorRT release version
run /opt/tensorrt/install_opensource.sh.  To build the open source parsers,
plugins, and samples for current top-of-tree on master or a different branch,
run /opt/tensorrt/install_opensource.sh -b <branch>
See https://github.com/NVIDIA/TensorRT for more information.

NOTE: CUDA Forward Compatibility mode ENABLED.
  Using CUDA 12.3 driver version 545.23.08 with kernel driver version 535.161.08.
  See https://docs.nvidia.com/deploy/cuda-compatibility/ for details.

2024-03-29 15:47:41 URL:https://files.ngc.nvidia.com/org/nvidia/team/ngc-apps/recipes/ngc_cli/versions/3.26.0/files/ngccli_linux.zip?Expires=1711730860&Signature=rCpDnOlSWX7EpeZaFG1c2Yz8QOmC99ocQwZ55kCaUBLpkYklgANNkLoQhJwgN0fdYQaEUwbg8yrOFNcJCFtS~q5HFdxphbVLHNK1E0nGdLzhEz5yhW4~Fg0A-8YlTaOAQsfTlEq5-zzUuB8evRQw~FRJzDqGgkcgVboWJvpfsNLai4xErgcexeH8bN84~rvhZdfajHdc8KBgOJ4Km5NkD9UIcAHN-09CMk7fT2JAyMo4wkqP5v7jL26u2hkIjRCNEm8e6z~yPlN5FvgDD62sddX2jCHC6jDL~QYyaEbD1ShCr02-d-MeG4KRHpweed01SHbWtqCHKkLpfaUIJGw3cw__&Key-Pair-Id=KCX06E8E9L60W [45878930/45878930] -> "ngccli_linux.zip" [1]
/opt/riva
You can now run: /tmp/aws/aws --version
/opt/riva
/data/rmir /opt/riva
  > Downloading nvidia/riva/rmir_asr_conformer_en_us_str:2.15.0...
CLI_VERSION: Latest - 3.41.0 available (current: 3.26.0). Please update by using the command 'ngc version upgrade' 

Getting files to download...
  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ • 635.1/635.1 MiB • Remaining: 0:00:00 • 86.5 MB/s • Elapsed: 0:00:08 • Total: 1 - Completed: 1 - Failed: 0

-------------------------------------------------------------------------------
   Download status: COMPLETED
   Downloaded local path model: /data/rmir/rmir_asr_conformer_en_us_str_v2.15.0
   Total files downloaded: 1
   Total transferred: 635.09 MB
   Started at: 2024-03-29 15:47:49
   Completed at: 2024-03-29 15:47:57
   Duration taken: 8s
-------------------------------------------------------------------------------
  > Downloading nvidia/riva/rmir_asr_conformer_en_us_ofl:2.15.0...
Getting files to download...
  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ • 635.1/635.1 MiB • Remaining: 0:00:00 • 85.2 MB/s • Elapsed: 0:00:08 • Total: 1 - Completed: 1 - Failed: 0

-------------------------------------------------------------------------------
   Download status: COMPLETED
   Downloaded local path model: /data/rmir/rmir_asr_conformer_en_us_ofl_v2.15.0
   Total files downloaded: 1
   Total transferred: 635.09 MB
   Started at: 2024-03-29 15:48:00
   Completed at: 2024-03-29 15:48:08
   Duration taken: 8s
-------------------------------------------------------------------------------
  > Downloading nvidia/riva/rmir_nlp_punctuation_bert_base_en_us:2.15.0...
Getting files to download...
  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ • 384.8/384.8 MiB • Remaining: 0:00:00 • 234.2 MB/s • Elapsed: 0:00:02 • Total: 1 - Completed: 1 - Failed: 0

---------------------------------------------------------------------------------------
   Download status: COMPLETED
   Downloaded local path model: /data/rmir/rmir_nlp_punctuation_bert_base_en_us_v2.15.0
   Total files downloaded: 1
   Total transferred: 384.79 MB
   Started at: 2024-03-29 15:48:11
   Completed at: 2024-03-29 15:48:14
   Duration taken: 2s
---------------------------------------------------------------------------------------
Directory rmir_nlp_punctuation_bert_base_en_us_v2.15.0 already exists, skipping. Use '--force' option to override.
  > Downloading nvidia/riva/rmir_tts_fastpitch_hifigan_en_us_ipa:2.15.0...
Getting files to download...
  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ • 227.5/227.5 MiB • Remaining: 0:00:00 • 95.0 MB/s • Elapsed: 0:00:07 • Total: 1 - Completed: 1 - Failed: 0

---------------------------------------------------------------------------------------
   Download status: COMPLETED
   Downloaded local path model: /data/rmir/rmir_tts_fastpitch_hifigan_en_us_ipa_v2.15.0
   Total files downloaded: 1
   Total transferred: 227.49 MB
   Started at: 2024-03-29 15:48:26
   Completed at: 2024-03-29 15:48:34
   Duration taken: 7s
---------------------------------------------------------------------------------------
/opt/riva

+ [[ non-tegra != \t\e\g\r\a ]]
+ [[ non-tegra == \t\e\g\r\a ]]
+ echo 'Converting RMIRs at riva-model-repo/rmir to Riva Model repository.'
Converting RMIRs at riva-model-repo/rmir to Riva Model repository.
+ docker run --init -it --rm --gpus '"device=0"' -v riva-model-repo:/data -e MODEL_DEPLOY_KEY=tlt_encode --name riva-service-maker nvcr.io/nvidia/riva/riva-speech:2.15.0-servicemaker deploy_all_models /data/rmir /data/models

==========================
=== Riva Speech Skills ===
==========================

NVIDIA Release  (build 86328935)
Copyright (c) 2016-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.

Copyright (c) 2018-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.

https://developer.nvidia.com/tensorrt

Various files include modifications (c) NVIDIA CORPORATION & AFFILIATES.  All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

To install Python sample dependencies, run /opt/tensorrt/python/python_setup.sh

To install the open-source samples corresponding to this TensorRT release version
run /opt/tensorrt/install_opensource.sh.  To build the open source parsers,
plugins, and samples for current top-of-tree on master or a different branch,
run /opt/tensorrt/install_opensource.sh -b <branch>
See https://github.com/NVIDIA/TensorRT for more information.

NOTE: CUDA Forward Compatibility mode ENABLED.
  Using CUDA 12.3 driver version 545.23.08 with kernel driver version 535.161.08.
  See https://docs.nvidia.com/deploy/cuda-compatibility/ for details.

2024-03-29 15:48:41,641 [INFO] Writing Riva model repository to '/data/models'...
2024-03-29 15:48:41,641 [INFO] The riva model repo target directory is /data/models
2024-03-29 15:48:50,321 [INFO] Using obey-precision pass with fp16 TRT
2024-03-29 15:48:50,321 [INFO] Extract_binaries for language_model -> /data/models/riva-trt-riva-punctuation-en-US-nn-bert-base-uncased/1
2024-03-29 15:48:50,321 [INFO] extracting {'onnx': ('nemo.collections.nlp.models.token_classification.punctuation_capitalization_model.PunctuationCapitalizationModel', 'model_graph.onnx')} -> /data/models/riva-trt-riva-punctuation-en-US-nn-bert-base-uncased/1
2024-03-29 15:48:50,808 [INFO] Printing copied artifacts:
2024-03-29 15:48:50,808 [INFO] {'onnx': '/data/models/riva-trt-riva-punctuation-en-US-nn-bert-base-uncased/1/model_graph.onnx'}
2024-03-29 15:48:50,808 [INFO] Building TRT engine from ONNX file /data/models/riva-trt-riva-punctuation-en-US-nn-bert-base-uncased/1/model_graph.onnx
[03/29/2024-15:49:01] [TRT] [W] onnx2trt_utils.cpp:374: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[03/29/2024-15:49:02] [TRT] [E] ModelImporter.cpp:535: Parse was called with a non-empty network definition
2024-03-29 15:49:02,131 [INFO] Mixed-precision net: 1495 layers, 1495 tensors, 2 outputs...
2024-03-29 15:49:02,158 [INFO] Mixed-precision net: 0 layers / 0 outputs fixed
[03/29/2024-15:49:32] [TRT] [W] TensorRT encountered issues when converting weights between types and that could affect accuracy.
[03/29/2024-15:49:32] [TRT] [W] If this is not the desired behavior, please modify the weights or retrain with regularization to adjust the magnitude of the weights.
[03/29/2024-15:49:32] [TRT] [W] Check verbose logs for the list of affected weights.
[03/29/2024-15:49:32] [TRT] [W] - 117 weights are affected by this issue: Detected subnormal FP16 values.
[03/29/2024-15:49:32] [TRT] [W] - 45 weights are affected by this issue: Detected values less than smallest positive FP16 subnormal value and converted them to the FP16 minimum subnormalized value.
[03/29/2024-15:49:32] [TRT] [W] - 1 weights are affected by this issue: Detected finite FP32 values which would overflow in FP16 and converted them to the closest finite FP16 value.
2024-03-29 15:49:32,873 [INFO] Writing engine to model repository: /data/models/riva-trt-riva-punctuation-en-US-nn-bert-base-uncased/1/model.plan
2024-03-29 15:49:33,114 [INFO] Capit dimensions:2
2024-03-29 15:49:33,114 [INFO] Punct dimensions:4
2024-03-29 15:49:33,176 [INFO] Extract_binaries for nlp_pipeline_backend -> /data/models/riva-punctuation-en-US/1
2024-03-29 15:49:33,176 [INFO] extracting {'vocab': ('nemo.collections.nlp.models.token_classification.punctuation_capitalization_model.PunctuationCapitalizationModel', 'f92889b136d2433693cb9127e1aea218_vocab.txt'), 'punctuation_mapping_path': ('nemo.collections.nlp.models.token_classification.punctuation_capitalization_model.PunctuationCapitalizationModel', 'nemo:fe160f3a917d411b99852e509e3279a3_punct_label_ids.csv'), 'capitalization_mapping_path': ('nemo.collections.nlp.models.token_classification.punctuation_capitalization_model.PunctuationCapitalizationModel', 'nemo:a4ed235fb32c44e58eab5854d3cd94f8_capit_label_ids.csv')} -> /data/models/riva-punctuation-en-US/1
2024-03-29 15:49:35,476 [INFO] Using onnx runtime
2024-03-29 15:49:35,476 [INFO] Using tensorrt with fp16
2024-03-29 15:49:35,476 [INFO] Extract_binaries for preprocessor -> /data/models/tts_preprocessor-English-US/1
2024-03-29 15:49:35,501 [INFO] extracting {'phone_dictionary_path': ('nemo.collections.tts.models.fastpitch.FastPitchModel', '/mnt/nvdl/datasets/jarvis_speech_ci/model_files/ipa_cmudict-0.7b_nv22.10.txt'), 'abbreviations': '/mnt/nvdl/datasets/jarvis_speech_ci/model_files/abbr.txt', 'mapping_file': ('nemo.collections.tts.models.fastpitch.FastPitchModel', '/opt/riva/mapping.txt'), 'wfst_tokenizer': '/mnt/nvdl/datasets/jarvis_speech_ci/model_files/tts_tn/currency_update/tokenize_and_classify.far', 'wfst_verbalizer': '/mnt/nvdl/datasets/jarvis_speech_ci/model_files/tts_tn/currency_update/verbalize.far'} -> /data/models/tts_preprocessor-English-US/1
2024-03-29 15:49:35,648 [INFO] Extract_binaries for encoderFastPitch -> /data/models/riva-onnx-fastpitch_encoder-English-US/1
2024-03-29 15:49:35,648 [INFO] extracting {'onnx': ('nemo.collections.tts.models.fastpitch.FastPitchModel', 'model_graph.onnx')} -> /data/models/riva-onnx-fastpitch_encoder-English-US/1
2024-03-29 15:49:37,327 [INFO] Printing copied artifacts:
2024-03-29 15:49:37,327 [INFO] {'onnx': '/data/models/riva-onnx-fastpitch_encoder-English-US/1/model_graph.onnx'}
2024-03-29 15:49:37,414 [INFO] Extract_binaries for chunkerFastPitch -> /data/models/spectrogram_chunker-English-US/1
2024-03-29 15:49:37,414 [INFO] No binaries to extract. Creating empty file at /data/models/spectrogram_chunker-English-US/1
2024-03-29 15:49:37,415 [INFO] Extract_binaries for hifigan -> /data/models/riva-trt-hifigan-English-US/1
2024-03-29 15:49:37,415 [INFO] extracting {'onnx': ('nemo.collections.tts.models.hifigan.HifiGanModel', 'model_graph.onnx')} -> /data/models/riva-trt-hifigan-English-US/1
2024-03-29 15:49:37,892 [INFO] Printing copied artifacts:
2024-03-29 15:49:37,892 [INFO] {'onnx': '/data/models/riva-trt-hifigan-English-US/1/model_graph.onnx'}
2024-03-29 15:49:37,892 [INFO] Building TRT engine from ONNX file /data/models/riva-trt-hifigan-English-US/1/model_graph.onnx
[03/29/2024-15:49:42] [TRT] [W] onnx2trt_utils.cpp:374: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[03/29/2024-15:49:42] [TRT] [E] ModelImporter.cpp:535: Parse was called with a non-empty network definition
[03/29/2024-15:53:16] [TRT] [W] TensorRT encountered issues when converting weights between types and that could affect accuracy.
[03/29/2024-15:53:16] [TRT] [W] If this is not the desired behavior, please modify the weights or retrain with regularization to adjust the magnitude of the weights.
[03/29/2024-15:53:16] [TRT] [W] Check verbose logs for the list of affected weights.
[03/29/2024-15:53:16] [TRT] [W] - 84 weights are affected by this issue: Detected subnormal FP16 values.
[03/29/2024-15:53:16] [TRT] [W] - 25 weights are affected by this issue: Detected values less than smallest positive FP16 subnormal value and converted them to the FP16 minimum subnormalized value.
2024-03-29 15:53:16,368 [INFO] Writing engine to model repository: /data/models/riva-trt-hifigan-English-US/1/model.plan
2024-03-29 15:53:16,743 [INFO] Extract_binaries for postprocessor -> /data/models/tts_postprocessor-English-US/1
2024-03-29 15:53:16,743 [INFO] No binaries to extract. Creating empty file at /data/models/tts_postprocessor-English-US/1
2024-03-29 15:53:16,744 [INFO] Extract_binaries for self -> /data/models/fastpitch_hifigan_ensemble-English-US/1
2024-03-29 15:53:16,744 [INFO] No binaries to extract. Creating empty file at /data/models/fastpitch_hifigan_ensemble-English-US/1
2024-03-29 15:53:16,745 [INFO] [{'model_name': 'tts_preprocessor-English-US', 'model_version': 1, 'input_map': {'input_string': 'INPUT', 'speaker': 'SPEAKER'}, 'output_map': {'output': 'input_encoder', 'is_last_sentence': 'chunker_is_last_sentence', 'output_string': 'PROCESSED_TEXT', 'sentence_num': 'SENTENCE_NUM', 'pitch': 'input_encoder_pitch', 'duration': 'input_encoder_dur', 'speaker': 'input_speaker', 'volume': 'input_volume'}}, {'model_name': 'riva-onnx-fastpitch_encoder-English-US', 'model_version': 1, 'input_map': {'text': 'input_encoder', 'pitch': 'input_encoder_pitch', 'pace': 'input_encoder_dur', 'speaker': 'input_speaker', 'volume': 'input_volume'}, 'output_map': {'spect': 'output_encoder', 'num_frames': 'num_valid_frames_encoder', 'durs_predicted': 'durs_predicted', 'volume_aligned': 'volume_out'}}, {'model_name': 'spectrogram_chunker-English-US', 'model_version': 1, 'input_map': {'SPECTROGRAM': 'output_encoder', 'IS_LAST_SENTENCE': 'chunker_is_last_sentence', 'NUM_VALID_FRAMES_IN': 'num_valid_frames_encoder', 'SENTENCE_NUM': 'SENTENCE_NUM', 'DURATIONS': 'durs_predicted', 'PROCESSED_TEXT': 'PROCESSED_TEXT', 'VOLUME': 'volume_out'}, 'output_map': {'SPECTROGRAM_CHUNK': 'spectrogram_chunk', 'END_FLAG': 'END_FLAG', 'NUM_VALID_SAMPLES_OUT': 'num_valid_samples', 'SENTENCE_NUM': 'OUT_SENTENCE_NUM', 'DURATIONS': 'OUT_DURATIONS', 'PROCESSED_TEXT': 'OUT_PROCESSED_TEXT', 'VOLUME': 'OUT_VOLUME'}}, {'model_name': 'riva-trt-hifigan-English-US', 'model_version': 1, 'input_map': {'spec': 'spectrogram_chunk'}, 'output_map': {'audio': 'audio_chunk'}}, {'model_name': 'tts_postprocessor-English-US', 'model_version': 1, 'input_map': {'INPUT': 'audio_chunk', 'NUM_VALID_SAMPLES': 'num_valid_samples', 'Prosody_volume': 'OUT_VOLUME'}, 'output_map': {'OUTPUT': 'OUTPUT'}}]
2024-03-29 15:53:30,688 [INFO] Using obey-precision pass with fp16 TRT
2024-03-29 15:53:30,689 [INFO] Extract_binaries for nn -> /data/models/riva-trt-conformer-en-US-asr-streaming-am-streaming/1
2024-03-29 15:53:30,689 [INFO] extracting {'onnx': ('nemo.collections.asr.models.ctc_bpe_models.EncDecCTCModelBPE', 'model_graph.onnx')} -> /data/models/riva-trt-conformer-en-US-asr-streaming-am-streaming/1
2024-03-29 15:53:31,681 [INFO] Printing copied artifacts:
2024-03-29 15:53:31,681 [INFO] {'onnx': '/data/models/riva-trt-conformer-en-US-asr-streaming-am-streaming/1/model_graph.onnx'}
2024-03-29 15:53:31,681 [INFO] Building TRT engine from ONNX file /data/models/riva-trt-conformer-en-US-asr-streaming-am-streaming/1/model_graph.onnx
[03/29/2024-15:53:39] [TRT] [W] onnx2trt_utils.cpp:374: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[03/29/2024-15:53:39] [TRT] [W] onnx2trt_utils.cpp:400: One or more weights outside the range of INT32 was clamped
[03/29/2024-15:53:40] [TRT] [E] ModelImporter.cpp:535: Parse was called with a non-empty network definition
2024-03-29 15:53:40,496 [INFO] Mixed-precision net: 5911 layers, 5911 tensors, 1 outputs...
2024-03-29 15:53:40,602 [INFO] Mixed-precision net: 0 layers / 0 outputs fixed
/usr/local/bin/deploy_all_models: line 21:   103 Killed                  riva-deploy $FORCE `find $rmir_path -name *.rmir -printf "%p:${MODEL_DEPLOY_KEY} "` $output_path
+ '[' 137 -ne 0 ']'
+ echo 'Error in deploying RMIR models.'
Error in deploying RMIR models.