Riva 1.8 riva_start.sh fail when build with language model

Hardware - GPU RTX 3060
Operating System: Ubuntu 18.04
Riva Version: 1.8

I have succeeded in running Riva1.8 with Conformer model + greedy decoder. But, when i built it with KenLM, it failed. I tried many times, but it still failed:

riva_init logs:

Logging into NGC docker registry if necessary...
Pulling required docker images if necessary...
Note: This may take some time, depending on the speed of your Internet connection.
> Pulling Riva Speech Server images.
  > Image nvcr.io/nvidia/riva/riva-speech:1.8.0-beta-server exists. Skipping.
  > Image nvcr.io/nvidia/riva/riva-speech-client:1.8.0-beta exists. Skipping.
  > Image nvcr.io/nvidia/riva/riva-speech:1.8.0-beta-servicemaker exists. Skipping.

Converting RMIRs at /dev/data/Desktop/NeMo_Conformer_20220105/rmir to Riva Model repository.
+ docker run --init -it --rm --gpus '"device=0"' -v /dev/data/Desktop/NeMo_Conformer_20220105:/data -e MODEL_DEPLOY_KEY=tlt_encode --name riva-service-maker nvcr.io/nvidia/riva/riva-speech:1.8.0-beta-servicemaker deploy_all_models /data/rmir /data/models

==========================
=== Riva Speech Skills ===
==========================

NVIDIA Release 21.12 (build 30304770)

Copyright (c) 2018-2021, NVIDIA CORPORATION.  All rights reserved.

Various files include modifications (c) NVIDIA CORPORATION.  All rights reserved.
NVIDIA modifications are covered by the license terms that apply to the underlying
project or file.

NOTE: The SHMEM allocation limit is set to the default of 64MB.  This may be
   insufficient for the inference server.  NVIDIA recommends the use of the following flags:
   nvidia-docker run --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 ...

2022-01-12 03:03:48,818 [INFO] Writing Riva model repository to '/data/models'...
2022-01-12 03:03:48,819 [INFO] The riva model repo target directory is /data/models
2022-01-12 03:04:20,968 [INFO] Using onnx runtime
2022-01-12 03:04:20,969 [INFO] Extract_binaries for nn -> /data/models/riva-onnx-conformer-en-US-asr-offline-am-streaming-offline/1
2022-01-12 03:04:20,969 [INFO] extracting {'onnx': ('nemo.collections.asr.models.ctc_bpe_models.EncDecCTCModelBPE', 'model_graph.onnx')} -> /data/models/riva-onnx-conformer-en-US-asr-offline-am-streaming-offline/1
2022-01-12 03:04:21,038 [INFO] Printing copied artifacts:
2022-01-12 03:04:21,038 [INFO] {'onnx': '/data/models/riva-onnx-conformer-en-US-asr-offline-am-streaming-offline/1/model_graph.onnx'}
2022-01-12 03:04:21,077 [INFO] Extract_binaries for featurizer -> /data/models/conformer-en-US-asr-offline-feature-extractor-streaming-offline/1
2022-01-12 03:04:21,079 [INFO] Extract_binaries for vad -> /data/models/conformer-en-US-asr-offline-voice-activity-detector-ctc-streaming-offline/1
2022-01-12 03:04:21,079 [INFO] extracting {'vocab_file': '/tmp/tmpu010tjan/riva_decoder_vocabulary.txt'} -> /data/models/conformer-en-US-asr-offline-voice-activity-detector-ctc-streaming-offline/1
2022-01-12 03:04:21,080 [INFO] Extract_binaries for lm_decoder -> /data/models/conformer-en-US-asr-offline-ctc-decoder-cpu-streaming-offline/1
2022-01-12 03:04:21,080 [INFO] extracting {'vocab_file': '/tmp/tmpu010tjan/riva_decoder_vocabulary.txt', 'decoding_language_model_binary': '/servicemaker-dev/20211201_quantization.bin', 'decoding_vocab': '/servicemaker-dev/lexicon_vocab_unique.txt', 'tokenizer_model': ('nemo.collections.asr.models.ctc_bpe_models.EncDecCTCModelBPE', '2a178396df1a412b868a2d3d9eafdf8f_tokenizer.model')} -> /data/models/conformer-en-US-asr-offline-ctc-decoder-cpu-streaming-offline/1
2022-01-12 03:04:21,600 [INFO] {'vocab_file': '/data/models/conformer-en-US-asr-offline-ctc-decoder-cpu-streaming-offline/1/riva_decoder_vocabulary.txt', 'decoding_language_model_binary': '/data/models/conformer-en-US-asr-offline-ctc-decoder-cpu-streaming-offline/1/20211201_quantization.bin', 'decoding_vocab': '/data/models/conformer-en-US-asr-offline-ctc-decoder-cpu-streaming-offline/1/lexicon_vocab_unique.txt', 'tokenizer_model': '/data/models/conformer-en-US-asr-offline-ctc-decoder-cpu-streaming-offline/1/2a178396df1a412b868a2d3d9eafdf8f_tokenizer.model'}
2022-01-12 03:04:21,600 [INFO] Model config has vocab file and tokenizer specified. Will create subword lexicon file from  vocab_file /data/models/conformer-en-US-asr-offline-ctc-decoder-cpu-streaming-offline/1/lexicon_vocab_unique.txt and tokenizer model /data/models/conformer-en-US-asr-offline-ctc-decoder-cpu-streaming-offline/1/2a178396df1a412b868a2d3d9eafdf8f_tokenizer.model
2022-01-12 03:04:21,710 [INFO] processed 10000 lines
2022-01-12 03:04:21,818 [INFO] processed 20000 lines
2022-01-12 03:04:21,926 [INFO] processed 30000 lines
2022-01-12 03:04:22,033 [INFO] processed 40000 lines
2022-01-12 03:04:22,147 [INFO] processed 50000 lines
2022-01-12 03:04:22,263 [INFO] processed 60000 lines
2022-01-12 03:04:22,375 [INFO] processed 70000 lines
2022-01-12 03:04:22,491 [INFO] processed 80000 lines
2022-01-12 03:04:22,606 [INFO] processed 90000 lines
2022-01-12 03:04:22,719 [INFO] processed 100000 lines
2022-01-12 03:04:22,833 [INFO] processed 110000 lines
2022-01-12 03:04:22,877 [INFO] skipped 0 empty lines
2022-01-12 03:04:22,877 [INFO] filtered 0 lines
2022-01-12 03:04:22,879 [INFO] Extract_binaries for self -> /data/models/conformer-en-US-asr-offline/1
+ echo

+ echo 'Riva initialization complete. Run ./riva_start.sh to launch services.'
Riva initialization complete. Run ./riva_start.sh to launch services.

Riva-speech container logs after run bash riva_start.sh:

==========================
=== Riva Speech Skills ===
==========================

NVIDIA Release 21.12 (build 30304767)

Copyright (c) 2018-2021, NVIDIA CORPORATION.  All rights reserved.

Various files include modifications (c) NVIDIA CORPORATION.  All rights reserved.
NVIDIA modifications are covered by the license terms that apply to the underlying
project or file.

NOTE: The SHMEM allocation limit is set to the default of 64MB.  This may be
   insufficient for the inference server.  NVIDIA recommends the use of the following flags:
   nvidia-docker run --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 ...

  > Riva waiting for Triton server to load all models...retrying in 1 second
I0112 03:04:42.257298 70 metrics.cc:290] Collecting metrics for GPU 0: NVIDIA GeForce RTX 3060
I0112 03:04:42.258608 70 onnxruntime.cc:1970] TRITONBACKEND_Initialize: onnxruntime
I0112 03:04:42.258620 70 onnxruntime.cc:1980] Triton TRITONBACKEND API version: 1.4
I0112 03:04:42.258623 70 onnxruntime.cc:1986] 'onnxruntime' TRITONBACKEND API version: 1.4
I0112 03:04:42.396962 70 pinned_memory_manager.cc:240] Pinned memory pool is created at '0x7f57c0000000' with size 268435456
I0112 03:04:42.397200 70 cuda_memory_manager.cc:105] CUDA memory pool is created on device 0 with size 1000000000
I0112 03:04:42.400142 70 model_repository_manager.cc:1045] loading: conformer-en-US-asr-offline-ctc-decoder-cpu-streaming-offline:1
I0112 03:04:42.500644 70 model_repository_manager.cc:1045] loading: conformer-en-US-asr-offline-voice-activity-detector-ctc-streaming-offline:1
I0112 03:04:42.511404 70 ctc-decoder-library.cc:20] TRITONBACKEND_ModelInitialize: conformer-en-US-asr-offline-ctc-decoder-cpu-streaming-offline (version 1)
W:parameter_parser.cc:118: Parameter forerunner_start_offset_ms could not be set from parameters
W:parameter_parser.cc:119: Default value will be used
W:parameter_parser.cc:118: Parameter forerunner_start_offset_ms could not be set from parameters
W:parameter_parser.cc:119: Default value will be used
W:parameter_parser.cc:118: Parameter max_num_slots could not be set from parameters
W:parameter_parser.cc:119: Default value will be used
I0112 03:04:42.512676 70 backend_model.cc:255] model configuration:
{
    "name": "conformer-en-US-asr-offline-ctc-decoder-cpu-streaming-offline",
    "platform": "",
    "backend": "riva_asr_decoder",
    "version_policy": {
        "latest": {
            "num_versions": 1
        }
    },
    "max_batch_size": 128,
    "input": [
        {
            "name": "CLASS_LOGITS",
            "data_type": "TYPE_FP32",
            "format": "FORMAT_NONE",
            "dims": [
                -1,
                257
            ],
            "is_shape_tensor": false,
            "allow_ragged_batch": false
        },
        {
            "name": "END_FLAG",
            "data_type": "TYPE_UINT32",
            "format": "FORMAT_NONE",
            "dims": [
                1
            ],
            "is_shape_tensor": false,
            "allow_ragged_batch": false
        },
        {
            "name": "SEGMENTS_START_END",
            "data_type": "TYPE_INT32",
            "format": "FORMAT_NONE",
            "dims": [
                -1,
                2
            ],
            "is_shape_tensor": false,
            "allow_ragged_batch": false
        },
        {
            "name": "CUSTOM_CONFIGURATION",
            "data_type": "TYPE_STRING",
            "format": "FORMAT_NONE",
            "dims": [
                -1,
                2
            ],
            "is_shape_tensor": false,
            "allow_ragged_batch": false
        }
    ],
    "output": [
        {
            "name": "FINAL_TRANSCRIPTS",
            "data_type": "TYPE_STRING",
            "dims": [
                -1
            ],
            "label_filename": "",
            "is_shape_tensor": false
        },
        {
            "name": "FINAL_TRANSCRIPTS_SCORE",
            "data_type": "TYPE_FP32",
            "dims": [
                -1
            ],
            "label_filename": "",
            "is_shape_tensor": false
        },
        {
            "name": "FINAL_WORDS_START_END",
            "data_type": "TYPE_INT32",
            "dims": [
                -1,
                2
            ],
            "label_filename": "",
            "is_shape_tensor": false
        },
        {
            "name": "PARTIAL_TRANSCRIPTS",
            "data_type": "TYPE_STRING",
            "dims": [
                -1
            ],
            "label_filename": "",
            "is_shape_tensor": false
        },
        {
            "name": "PARTIAL_TRANSCRIPTS_STABILITY",
            "data_type": "TYPE_FP32",
            "dims": [
                -1
            ],
            "label_filename": "",
            "is_shape_tensor": false
        },
        {
            "name": "PARTIAL_WORDS_START_END",
            "data_type": "TYPE_INT32",
            "dims": [
                -1,
                2
            ],
            "label_filename": "",
            "is_shape_tensor": false
        }
    ],
    "batch_input": [],
    "batch_output": [],
    "optimization": {
        "priority": "PRIORITY_DEFAULT",
        "cuda": {
            "graphs": false,
            "busy_wait_events": false,
            "graph_spec": [],
            "output_copy_stream": true
        },
        "input_pinned_memory": {
            "enable": true
        },
        "output_pinned_memory": {
            "enable": true
        },
        "gather_kernel_buffer_threshold": 0,
        "eager_batching": false
    },
    "sequence_batching": {
        "oldest": {
            "max_candidate_sequences": 128,
            "preferred_batch_size": [
                32,
                64
            ],
            "max_queue_delay_microseconds": 1000
        },
        "max_sequence_idle_microseconds": 60000000,
        "control_input": [
            {
                "name": "START",
                "control": [
                    {
                        "kind": "CONTROL_SEQUENCE_START",
                        "int32_false_true": [
                            0,
                            1
                        ],
                        "fp32_false_true": [],
                        "data_type": "TYPE_INVALID"
                    }
                ]
            },
            {
                "name": "READY",
                "control": [
                    {
                        "kind": "CONTROL_SEQUENCE_READY",
                        "int32_false_true": [
                            0,
                            1
                        ],
                        "fp32_false_true": [],
                        "data_type": "TYPE_INVALID"
                    }
                ]
            },
            {
                "name": "END",
                "control": [
                    {
                        "kind": "CONTROL_SEQUENCE_END",
                        "int32_false_true": [
                            0,
                            1
                        ],
                        "fp32_false_true": [],
                        "data_type": "TYPE_INVALID"
                    }
                ]
            },
            {
                "name": "CORRID",
                "control": [
                    {
                        "kind": "CONTROL_SEQUENCE_CORRID",
                        "int32_false_true": [],
                        "fp32_false_true": [],
                        "data_type": "TYPE_UINT64"
                    }
                ]
            }
        ]
    },
    "instance_group": [
        {
            "name": "conformer-en-US-asr-offline-ctc-decoder-cpu-streaming-offline_0",
            "kind": "KIND_CPU",
            "count": 1,
            "gpus": [],
            "secondary_devices": [],
            "profile": [],
            "passive": false,
            "host_policy": ""
        }
    ],
    "default_model_filename": "",
    "cc_model_filenames": {},
    "metric_tags": {},
    "parameters": {
        "streaming": {
            "string_value": "True"
        },
        "use_subword": {
            "string_value": "True"
        },
        "beam_size": {
            "string_value": "16"
        },
        "right_padding_size": {
            "string_value": "0.0"
        },
        "beam_size_token": {
            "string_value": "16"
        },
        "sil_token": {
            "string_value": "▁"
        },
        "beam_threshold": {
            "string_value": "20.0"
        },
        "language_model_file": {
            "string_value": "/data/models/conformer-en-US-asr-offline-ctc-decoder-cpu-streaming-offline/1/20211201_quantization.bin"
        },
        "max_execution_batch_size": {
            "string_value": "1024"
        },
        "forerunner_use_lm": {
            "string_value": "true"
        },
        "forerunner_beam_size_token": {
            "string_value": "8"
        },
        "forerunner_beam_threshold": {
            "string_value": "10.0"
        },
        "asr_model_delay": {
            "string_value": "-1"
        },
        "decoder_num_worker_threads": {
            "string_value": "-1"
        },
        "word_insertion_score": {
            "string_value": "0.2"
        },
        "left_padding_size": {
            "string_value": "0.0"
        },
        "decoder_type": {
            "string_value": "flashlight"
        },
        "compute_timestamps": {
            "string_value": "True"
        },
        "forerunner_beam_size": {
            "string_value": "8"
        },
        "max_supported_transcripts": {
            "string_value": "1"
        },
        "chunk_size": {
            "string_value": "200.0"
        },
        "lexicon_file": {
            "string_value": "/data/models/conformer-en-US-asr-offline-ctc-decoder-cpu-streaming-offline/1/lexicon.txt"
        },
        "smearing_mode": {
            "string_value": "max"
        },
        "use_vad": {
            "string_value": "True"
        },
        "blank_token": {
            "string_value": "#"
        },
        "lm_weight": {
            "string_value": "0.2"
        },
        "vocab_file": {
            "string_value": "/data/models/conformer-en-US-asr-offline-ctc-decoder-cpu-streaming-offline/1/riva_decoder_vocabulary.txt"
        },
        "ms_per_timestep": {
            "string_value": "40"
        }
    },
    "model_warmup": [],
    "model_transaction_policy": {
        "decoupled": false
    }
}
I0112 03:04:42.512721 70 ctc-decoder-library.cc:23] TRITONBACKEND_ModelInstanceInitialize: conformer-en-US-asr-offline-ctc-decoder-cpu-streaming-offline_0 (device 0)
terminate called after throwing an instance of 'std::runtime_error'
  what():  [LoadWords] Invalid line: ‎	
/opt/riva/bin/start-riva: line 4:    70 Aborted                 (core dumped) tritonserver --log-verbose=0 --strict-model-config=true $model_repos --cuda-memory-pool-byte-size=0:1000000000
  > Triton server died before reaching ready state. Terminating Riva startup.
Check Triton logs with: docker logs 
kill: usage: kill [-s sigspec | -n signum | -sigspec] pid | jobspec ... or kill -l [sigspec]

1 Like

+1, same error

I have built Riva again (version 2.2.1) recently, and previous errors have been fixed.

Hi @user128631 and @sumeet.tiwari

Thanks for your interest in Riva,

Thanks for your inputs and suggestion,

We suggest using the latest build → 2.3.0

https://docs.nvidia.com/deeplearning/riva/user-guide/docs/quick-start-guide.html

Thanks