Riva 1.8 riva_start.sh fail when build with language model

user128631 · January 12, 2022, 3:47am

Hardware - GPU RTX 3060
Operating System: Ubuntu 18.04
Riva Version: 1.8

I have succeeded in running Riva1.8 with Conformer model + greedy decoder. But, when i built it with KenLM, it failed. I tried many times, but it still failed:

riva_init logs:

Logging into NGC docker registry if necessary...
Pulling required docker images if necessary...
Note: This may take some time, depending on the speed of your Internet connection.
> Pulling Riva Speech Server images.
  > Image nvcr.io/nvidia/riva/riva-speech:1.8.0-beta-server exists. Skipping.
  > Image nvcr.io/nvidia/riva/riva-speech-client:1.8.0-beta exists. Skipping.
  > Image nvcr.io/nvidia/riva/riva-speech:1.8.0-beta-servicemaker exists. Skipping.

Converting RMIRs at /dev/data/Desktop/NeMo_Conformer_20220105/rmir to Riva Model repository.
+ docker run --init -it --rm --gpus '"device=0"' -v /dev/data/Desktop/NeMo_Conformer_20220105:/data -e MODEL_DEPLOY_KEY=tlt_encode --name riva-service-maker nvcr.io/nvidia/riva/riva-speech:1.8.0-beta-servicemaker deploy_all_models /data/rmir /data/models

==========================
=== Riva Speech Skills ===
==========================

NVIDIA Release 21.12 (build 30304770)

Copyright (c) 2018-2021, NVIDIA CORPORATION.  All rights reserved.

Various files include modifications (c) NVIDIA CORPORATION.  All rights reserved.
NVIDIA modifications are covered by the license terms that apply to the underlying
project or file.

NOTE: The SHMEM allocation limit is set to the default of 64MB.  This may be
   insufficient for the inference server.  NVIDIA recommends the use of the following flags:
   nvidia-docker run --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 ...

2022-01-12 03:03:48,818 [INFO] Writing Riva model repository to '/data/models'...
2022-01-12 03:03:48,819 [INFO] The riva model repo target directory is /data/models
2022-01-12 03:04:20,968 [INFO] Using onnx runtime
2022-01-12 03:04:20,969 [INFO] Extract_binaries for nn -> /data/models/riva-onnx-conformer-en-US-asr-offline-am-streaming-offline/1
2022-01-12 03:04:20,969 [INFO] extracting {'onnx': ('nemo.collections.asr.models.ctc_bpe_models.EncDecCTCModelBPE', 'model_graph.onnx')} -> /data/models/riva-onnx-conformer-en-US-asr-offline-am-streaming-offline/1
2022-01-12 03:04:21,038 [INFO] Printing copied artifacts:
2022-01-12 03:04:21,038 [INFO] {'onnx': '/data/models/riva-onnx-conformer-en-US-asr-offline-am-streaming-offline/1/model_graph.onnx'}
2022-01-12 03:04:21,077 [INFO] Extract_binaries for featurizer -> /data/models/conformer-en-US-asr-offline-feature-extractor-streaming-offline/1
2022-01-12 03:04:21,079 [INFO] Extract_binaries for vad -> /data/models/conformer-en-US-asr-offline-voice-activity-detector-ctc-streaming-offline/1
2022-01-12 03:04:21,079 [INFO] extracting {'vocab_file': '/tmp/tmpu010tjan/riva_decoder_vocabulary.txt'} -> /data/models/conformer-en-US-asr-offline-voice-activity-detector-ctc-streaming-offline/1
2022-01-12 03:04:21,080 [INFO] Extract_binaries for lm_decoder -> /data/models/conformer-en-US-asr-offline-ctc-decoder-cpu-streaming-offline/1
2022-01-12 03:04:21,080 [INFO] extracting {'vocab_file': '/tmp/tmpu010tjan/riva_decoder_vocabulary.txt', 'decoding_language_model_binary': '/servicemaker-dev/20211201_quantization.bin', 'decoding_vocab': '/servicemaker-dev/lexicon_vocab_unique.txt', 'tokenizer_model': ('nemo.collections.asr.models.ctc_bpe_models.EncDecCTCModelBPE', '2a178396df1a412b868a2d3d9eafdf8f_tokenizer.model')} -> /data/models/conformer-en-US-asr-offline-ctc-decoder-cpu-streaming-offline/1
2022-01-12 03:04:21,600 [INFO] {'vocab_file': '/data/models/conformer-en-US-asr-offline-ctc-decoder-cpu-streaming-offline/1/riva_decoder_vocabulary.txt', 'decoding_language_model_binary': '/data/models/conformer-en-US-asr-offline-ctc-decoder-cpu-streaming-offline/1/20211201_quantization.bin', 'decoding_vocab': '/data/models/conformer-en-US-asr-offline-ctc-decoder-cpu-streaming-offline/1/lexicon_vocab_unique.txt', 'tokenizer_model': '/data/models/conformer-en-US-asr-offline-ctc-decoder-cpu-streaming-offline/1/2a178396df1a412b868a2d3d9eafdf8f_tokenizer.model'}
2022-01-12 03:04:21,600 [INFO] Model config has vocab file and tokenizer specified. Will create subword lexicon file from  vocab_file /data/models/conformer-en-US-asr-offline-ctc-decoder-cpu-streaming-offline/1/lexicon_vocab_unique.txt and tokenizer model /data/models/conformer-en-US-asr-offline-ctc-decoder-cpu-streaming-offline/1/2a178396df1a412b868a2d3d9eafdf8f_tokenizer.model
2022-01-12 03:04:21,710 [INFO] processed 10000 lines
2022-01-12 03:04:21,818 [INFO] processed 20000 lines
2022-01-12 03:04:21,926 [INFO] processed 30000 lines
2022-01-12 03:04:22,033 [INFO] processed 40000 lines
2022-01-12 03:04:22,147 [INFO] processed 50000 lines
2022-01-12 03:04:22,263 [INFO] processed 60000 lines
2022-01-12 03:04:22,375 [INFO] processed 70000 lines
2022-01-12 03:04:22,491 [INFO] processed 80000 lines
2022-01-12 03:04:22,606 [INFO] processed 90000 lines
2022-01-12 03:04:22,719 [INFO] processed 100000 lines
2022-01-12 03:04:22,833 [INFO] processed 110000 lines
2022-01-12 03:04:22,877 [INFO] skipped 0 empty lines
2022-01-12 03:04:22,877 [INFO] filtered 0 lines
2022-01-12 03:04:22,879 [INFO] Extract_binaries for self -> /data/models/conformer-en-US-asr-offline/1
+ echo

+ echo 'Riva initialization complete. Run ./riva_start.sh to launch services.'
Riva initialization complete. Run ./riva_start.sh to launch services.

Riva-speech container logs after run bash riva_start.sh:

==========================
=== Riva Speech Skills ===
==========================

NVIDIA Release 21.12 (build 30304767)

Copyright (c) 2018-2021, NVIDIA CORPORATION.  All rights reserved.

Various files include modifications (c) NVIDIA CORPORATION.  All rights reserved.
NVIDIA modifications are covered by the license terms that apply to the underlying
project or file.

NOTE: The SHMEM allocation limit is set to the default of 64MB.  This may be
   insufficient for the inference server.  NVIDIA recommends the use of the following flags:
   nvidia-docker run --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 ...

  > Riva waiting for Triton server to load all models...retrying in 1 second
I0112 03:04:42.257298 70 metrics.cc:290] Collecting metrics for GPU 0: NVIDIA GeForce RTX 3060
I0112 03:04:42.258608 70 onnxruntime.cc:1970] TRITONBACKEND_Initialize: onnxruntime
I0112 03:04:42.258620 70 onnxruntime.cc:1980] Triton TRITONBACKEND API version: 1.4
I0112 03:04:42.258623 70 onnxruntime.cc:1986] 'onnxruntime' TRITONBACKEND API version: 1.4
I0112 03:04:42.396962 70 pinned_memory_manager.cc:240] Pinned memory pool is created at '0x7f57c0000000' with size 268435456
I0112 03:04:42.397200 70 cuda_memory_manager.cc:105] CUDA memory pool is created on device 0 with size 1000000000
I0112 03:04:42.400142 70 model_repository_manager.cc:1045] loading: conformer-en-US-asr-offline-ctc-decoder-cpu-streaming-offline:1
I0112 03:04:42.500644 70 model_repository_manager.cc:1045] loading: conformer-en-US-asr-offline-voice-activity-detector-ctc-streaming-offline:1
I0112 03:04:42.511404 70 ctc-decoder-library.cc:20] TRITONBACKEND_ModelInitialize: conformer-en-US-asr-offline-ctc-decoder-cpu-streaming-offline (version 1)
W:parameter_parser.cc:118: Parameter forerunner_start_offset_ms could not be set from parameters
W:parameter_parser.cc:119: Default value will be used
W:parameter_parser.cc:118: Parameter forerunner_start_offset_ms could not be set from parameters
W:parameter_parser.cc:119: Default value will be used
W:parameter_parser.cc:118: Parameter max_num_slots could not be set from parameters
W:parameter_parser.cc:119: Default value will be used
I0112 03:04:42.512676 70 backend_model.cc:255] model configuration:
{
    "name": "conformer-en-US-asr-offline-ctc-decoder-cpu-streaming-offline",
    "platform": "",
    "backend": "riva_asr_decoder",
    "version_policy": {
        "latest": {
            "num_versions": 1
        }
    },
    "max_batch_size": 128,
    "input": [
        {
            "name": "CLASS_LOGITS",
            "data_type": "TYPE_FP32",
            "format": "FORMAT_NONE",
            "dims": [
                -1,
                257
            ],
            "is_shape_tensor": false,
            "allow_ragged_batch": false
        },
        {
            "name": "END_FLAG",
            "data_type": "TYPE_UINT32",
            "format": "FORMAT_NONE",
            "dims": [
                1
            ],
            "is_shape_tensor": false,
            "allow_ragged_batch": false
        },
        {
            "name": "SEGMENTS_START_END",
            "data_type": "TYPE_INT32",
            "format": "FORMAT_NONE",
            "dims": [
                -1,
                2
            ],
            "is_shape_tensor": false,
            "allow_ragged_batch": false
        },
        {
            "name": "CUSTOM_CONFIGURATION",
            "data_type": "TYPE_STRING",
            "format": "FORMAT_NONE",
            "dims": [
                -1,
                2
            ],
            "is_shape_tensor": false,
            "allow_ragged_batch": false
        }
    ],
    "output": [
        {
            "name": "FINAL_TRANSCRIPTS",
            "data_type": "TYPE_STRING",
            "dims": [
                -1
            ],
            "label_filename": "",
            "is_shape_tensor": false
        },
        {
            "name": "FINAL_TRANSCRIPTS_SCORE",
            "data_type": "TYPE_FP32",
            "dims": [
                -1
            ],
            "label_filename": "",
            "is_shape_tensor": false
        },
        {
            "name": "FINAL_WORDS_START_END",
            "data_type": "TYPE_INT32",
            "dims": [
                -1,
                2
            ],
            "label_filename": "",
            "is_shape_tensor": false
        },
        {
            "name": "PARTIAL_TRANSCRIPTS",
            "data_type": "TYPE_STRING",
            "dims": [
                -1
            ],
            "label_filename": "",
            "is_shape_tensor": false
        },
        {
            "name": "PARTIAL_TRANSCRIPTS_STABILITY",
            "data_type": "TYPE_FP32",
            "dims": [
                -1
            ],
            "label_filename": "",
            "is_shape_tensor": false
        },
        {
            "name": "PARTIAL_WORDS_START_END",
            "data_type": "TYPE_INT32",
            "dims": [
                -1,
                2
            ],
            "label_filename": "",
            "is_shape_tensor": false
        }
    ],
    "batch_input": [],
    "batch_output": [],
    "optimization": {
        "priority": "PRIORITY_DEFAULT",
        "cuda": {
            "graphs": false,
            "busy_wait_events": false,
            "graph_spec": [],
            "output_copy_stream": true
        },
        "input_pinned_memory": {
            "enable": true
        },
        "output_pinned_memory": {
            "enable": true
        },
        "gather_kernel_buffer_threshold": 0,
        "eager_batching": false
    },
    "sequence_batching": {
        "oldest": {
            "max_candidate_sequences": 128,
            "preferred_batch_size": [
                32,
                64
            ],
            "max_queue_delay_microseconds": 1000
        },
        "max_sequence_idle_microseconds": 60000000,
        "control_input": [
            {
                "name": "START",
                "control": [
                    {
                        "kind": "CONTROL_SEQUENCE_START",
                        "int32_false_true": [
                            0,
                            1
                        ],
                        "fp32_false_true": [],
                        "data_type": "TYPE_INVALID"
                    }
                ]
            },
            {
                "name": "READY",
                "control": [
                    {
                        "kind": "CONTROL_SEQUENCE_READY",
                        "int32_false_true": [
                            0,
                            1
                        ],
                        "fp32_false_true": [],
                        "data_type": "TYPE_INVALID"
                    }
                ]
            },
            {
                "name": "END",
                "control": [
                    {
                        "kind": "CONTROL_SEQUENCE_END",
                        "int32_false_true": [
                            0,
                            1
                        ],
                        "fp32_false_true": [],
                        "data_type": "TYPE_INVALID"
                    }
                ]
            },
            {
                "name": "CORRID",
                "control": [
                    {
                        "kind": "CONTROL_SEQUENCE_CORRID",
                        "int32_false_true": [],
                        "fp32_false_true": [],
                        "data_type": "TYPE_UINT64"
                    }
                ]
            }
        ]
    },
    "instance_group": [
        {
            "name": "conformer-en-US-asr-offline-ctc-decoder-cpu-streaming-offline_0",
            "kind": "KIND_CPU",
            "count": 1,
            "gpus": [],
            "secondary_devices": [],
            "profile": [],
            "passive": false,
            "host_policy": ""
        }
    ],
    "default_model_filename": "",
    "cc_model_filenames": {},
    "metric_tags": {},
    "parameters": {
        "streaming": {
            "string_value": "True"
        },
        "use_subword": {
            "string_value": "True"
        },
        "beam_size": {
            "string_value": "16"
        },
        "right_padding_size": {
            "string_value": "0.0"
        },
        "beam_size_token": {
            "string_value": "16"
        },
        "sil_token": {
            "string_value": "▁"
        },
        "beam_threshold": {
            "string_value": "20.0"
        },
        "language_model_file": {
            "string_value": "/data/models/conformer-en-US-asr-offline-ctc-decoder-cpu-streaming-offline/1/20211201_quantization.bin"
        },
        "max_execution_batch_size": {
            "string_value": "1024"
        },
        "forerunner_use_lm": {
            "string_value": "true"
        },
        "forerunner_beam_size_token": {
            "string_value": "8"
        },
        "forerunner_beam_threshold": {
            "string_value": "10.0"
        },
        "asr_model_delay": {
            "string_value": "-1"
        },
        "decoder_num_worker_threads": {
            "string_value": "-1"
        },
        "word_insertion_score": {
            "string_value": "0.2"
        },
        "left_padding_size": {
            "string_value": "0.0"
        },
        "decoder_type": {
            "string_value": "flashlight"
        },
        "compute_timestamps": {
            "string_value": "True"
        },
        "forerunner_beam_size": {
            "string_value": "8"
        },
        "max_supported_transcripts": {
            "string_value": "1"
        },
        "chunk_size": {
            "string_value": "200.0"
        },
        "lexicon_file": {
            "string_value": "/data/models/conformer-en-US-asr-offline-ctc-decoder-cpu-streaming-offline/1/lexicon.txt"
        },
        "smearing_mode": {
            "string_value": "max"
        },
        "use_vad": {
            "string_value": "True"
        },
        "blank_token": {
            "string_value": "#"
        },
        "lm_weight": {
            "string_value": "0.2"
        },
        "vocab_file": {
            "string_value": "/data/models/conformer-en-US-asr-offline-ctc-decoder-cpu-streaming-offline/1/riva_decoder_vocabulary.txt"
        },
        "ms_per_timestep": {
            "string_value": "40"
        }
    },
    "model_warmup": [],
    "model_transaction_policy": {
        "decoupled": false
    }
}
I0112 03:04:42.512721 70 ctc-decoder-library.cc:23] TRITONBACKEND_ModelInstanceInitialize: conformer-en-US-asr-offline-ctc-decoder-cpu-streaming-offline_0 (device 0)
terminate called after throwing an instance of 'std::runtime_error'
  what():  [LoadWords] Invalid line: ‎	
/opt/riva/bin/start-riva: line 4:    70 Aborted                 (core dumped) tritonserver --log-verbose=0 --strict-model-config=true $model_repos --cuda-memory-pool-byte-size=0:1000000000
  > Triton server died before reaching ready state. Terminating Riva startup.
Check Triton logs with: docker logs 
kill: usage: kill [-s sigspec | -n signum | -sigspec] pid | jobspec ... or kill -l [sigspec]

sumeet.tiwari · July 26, 2022, 6:05am

+1, same error

user128631 · July 26, 2022, 9:54am

I have built Riva again (version 2.2.1) recently, and previous errors have been fixed.

rvinobha · July 27, 2022, 4:53pm

Hi @user128631 and @sumeet.tiwari

Thanks for your interest in Riva,

Thanks for your inputs and suggestion,

We suggest using the latest build → 2.3.0

https://docs.nvidia.com/deeplearning/riva/user-guide/docs/quick-start-guide.html

Thanks

Topic		Replies	Views
How can I start Riva without an error Riva riva	7	2546	September 29, 2021
Run init_start.sh failed Riva riva	1	1264	April 12, 2022
Recreate QuickStart Stock Citrinet Model with Modified Parameters Riva	14	1714	August 4, 2022
Riva 2.0 ASR not working Riva	2	860	May 18, 2022
NGC RMIRs Error in downloading models Riva riva	17	1113	February 26, 2024
Riva_start.sh will not start the server Riva riva	4	1120	August 31, 2023
Not able to run LM fine tuned qurtznet model Riva riva	13	1264	October 8, 2021
Nvidia Riva health check fail Riva riva	1	463	February 14, 2025
Encounter "Unsupported model IR version: 9, max supported IR version: 8" during deploy custom model in riva for TTS Riva onnx , riva	9	3286	January 22, 2024
Riva Quickstart 2.1.0 installation fails on AGX Orin Riva riva	13	1409	October 17, 2022

Riva 1.8 riva_start.sh fail when build with language model

Related topics