Recreate QuickStart Stock Citrinet Model with Modified Parameters

Please provide the following information when requesting support.

Hardware - GPU (A100/A30/T4/V100)
**Tesla T4
Hardware - CPU
**I’m using an AWS EC2 instance type of g4dn.xlarge
Operating System
**Amazon Linux 2
Riva Version
**Riva 1.8, using riva_quickstart_v1.8.0-beta
TLT Version (if relevant)

Just for completeness, nvidia-smi gave me this:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 495.29.05    Driver Version: 495.29.05    CUDA Version: 11.5     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla T4            Off  | 00000000:00:1E.0 Off |                    0 |
| N/A   38C    P0    26W /  70W |      0MiB / 15109MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

How to reproduce the issue ? (This is for errors. Please share the command and the detailed log here)

I’m trying to recreate the QuickStart Stock Citrinet Model with Modified Parameters, much like this question Rebuilding the asrset3 citrinet offline pipeline but with larger chunk size .

But it never got a response. I’d like to know how to rebuild the stock file given in the QuickStart and modify some of the parameters.

I tried using this riva-build command given at https://docs.nvidia.com/deeplearning/riva/user-guide/docs/service-asr.html for the Citrinet-1024 Acoustic Model for Streaming High-Throughput:

riva-build speech_recognition \
   <rmir_filename>:<key> <riva_filename>:<key> \
   --name=citrinet-1024-english-asr-streaming \
   --ms_per_timestep=80 \
   --featurizer.use_utterance_norm_params=False \
   --featurizer.precalc_norm_time_steps=0 \
   --featurizer.precalc_norm_params=False \
   --vad.residue_blanks_at_start=-2 \
   --chunk_size=0.8 \
   --left_padding_size=1.6 \
   --right_padding_size=1.6 \
   --decoder_type=flashlight \
   --flashlight_decoder.asr_model_delay=-1 \
   --decoding_language_model_binary=<lm_binary> \
   --decoding_vocab=<decoder_lexicon> \
   --flashlight_decoder.lm_weight=0.2 \
   --flashlight_decoder.word_insertion_score=0.2 \
   --flashlight_decoder.beam_threshold=20. \
   --language_code=en-US

Which has 4 unknowns (to me):

  1. <rmir_filename>:<key>
  2. <riva_filename>:<key>
  3. <lm_binary>
  4. <decoder_lexicon>

I populated these 4 parameters with the following:

  1. <rmir_filename>:<key>
    I used my desired name of citrinet_v3_stock.rmir for the .rmir file with a key of tlt_encode
  2. <riva_filename>:<key>
    For this I used the .riva NGC file in the Acoustic Model (AM) repository “RIVA Citrinet ASR English” released on Jan 7th, 2022 found at RIVA Citrinet ASR English | NVIDIA NGC
    I used the file named citrinet-1024-Jarvis-asrset-3_0-encrypted.riva, which seemed to be the most recent deployable AM I could find. The key I used is also tlt_encode, as indicated in the model’s Overview section.
  3. <lm_binary>
    I wanted to use flashlight as the decoder type, since that’s what the user-guide’s riva-build used. And I found the LM repository “Riva ASR English LM”, last modified Jan 7, 2022 at Riva ASR English(en-US) LM | NVIDIA NGC
    I used the KenLM-formatted binary file named mixed-lower.binary.
  4. <decoder_lexicon>
    I wasn’t sure what to use for this. Looking around it looks liked I wanted a .txt file, so I used the one included in the language model I found in #3.
    The file name is words.mixed_lm.txt.

So putting it all together, I ended up with this riva-build command:

riva-build speech_recognition \
   /data/rmir/citrinet_v3_stock.rmir:tlt_encode speechtotext_en_us_citrinet_vdeployable_v3.0/citrinet-1024-Jarvis-asrset-3_0-encrypted.riva:tlt_encode \
   --name=citrinet-v3-stock-1024-english-asr-streaming \
   --ms_per_timestep=80 \
   --featurizer.use_utterance_norm_params=False \
   --featurizer.precalc_norm_time_steps=0 \
   --featurizer.precalc_norm_params=False \
   --vad.residue_blanks_at_start=-2 \
   --chunk_size=0.8 \
   --left_padding_size=1.6 \
   --right_padding_size=1.6 \
   --decoder_type=flashlight \
   --flashlight_decoder.asr_model_delay=-1 \
   --decoding_language_model_binary=speechtotext_en_us_lm_vdeployable_v1-1.0/mixed-lower.binary \
   --decoding_vocab=speechtotext_en_us_lm_vdeployable_v1-1.0/words.mixed_lm.txt \
   --flashlight_decoder.lm_weight=0.2 \
   --flashlight_decoder.word_insertion_score=0.2 \
   --flashlight_decoder.beam_threshold=20. \
   --language_code=en-US -f

The build was successful and generated a .rmir file.

2022-01-17 17:20:19,763 [WARNING] Property 'encrypted' is deprecated. Please use 'encryption' instead.
2022-01-17 17:20:19,764 [WARNING] Property 'binary' is deprecated. Please use the callback system instead.
2022-01-17 17:20:20,534 [INFO] Packing binaries for nn/ONNX : {'onnx': ('nemo.collections.asr.models.ctc_bpe_models.EncDecCTCModelBPE', 'model_graph.onnx')}
2022-01-17 17:20:20,534 [INFO] Copying onnx:model_graph.onnx -> nn:nn-model_graph.onnx
2022-01-17 17:20:25,276 [INFO] Packing binaries for lm_decoder/ONNX : {'vocab_file': '/tmp/tmp8hinuqiw/riva_decoder_vocabulary.txt', 'decoding_language_model_binary': '/root/Downloads/speechtotext_en_us_lm_vdeployable_v1-1.0/mixed-lower.binary', 'decoding_vocab': '/root/Downloads/speechtotext_en_us_lm_vdeployable_v1-1.0/words.mixed_lm.txt', 'tokenizer_model': ('nemo.collections.asr.models.ctc_bpe_models.EncDecCTCModelBPE', '498056ba420d4bb3831ad557fba06032_tokenizer.model')}
2022-01-17 17:20:25,277 [INFO] Copying vocab_file:/tmp/tmp8hinuqiw/riva_decoder_vocabulary.txt -> lm_decoder:lm_decoder-riva_decoder_vocabulary.txt
2022-01-17 17:20:25,277 [INFO] Copying decoding_language_model_binary:/root/Downloads/speechtotext_en_us_lm_vdeployable_v1-1.0/mixed-lower.binary -> lm_decoder:lm_decoder-mixed-lower.binary
2022-01-17 17:20:26,177 [INFO] Copying decoding_vocab:/root/Downloads/speechtotext_en_us_lm_vdeployable_v1-1.0/words.mixed_lm.txt -> lm_decoder:lm_decoder-words.mixed_lm.txt
2022-01-17 17:20:26,180 [INFO] Copying tokenizer_model:498056ba420d4bb3831ad557fba06032_tokenizer.model -> lm_decoder:lm_decoder-498056ba420d4bb3831ad557fba06032_tokenizer.model
2022-01-17 17:20:26,180 [INFO] Packing binaries for rescorer/ONNX : {'vocab_file': '/tmp/tmp8hinuqiw/riva_decoder_vocabulary.txt'}
2022-01-17 17:20:26,180 [INFO] Copying vocab_file:/tmp/tmp8hinuqiw/riva_decoder_vocabulary.txt -> rescorer:rescorer-riva_decoder_vocabulary.txt
2022-01-17 17:20:26,181 [INFO] Packing binaries for vad/ONNX : {'vocab_file': '/tmp/tmp8hinuqiw/riva_decoder_vocabulary.txt'}
2022-01-17 17:20:26,181 [INFO] Copying vocab_file:/tmp/tmp8hinuqiw/riva_decoder_vocabulary.txt -> vad:vad-riva_decoder_vocabulary.txt
2022-01-17 17:20:26,181 [INFO] Saving to /data/rmir/citrinet_v3_stock.rmir

I then did a riva_init and it generated a model successfully:

Logging into NGC docker registry if necessary...
Pulling required docker images if necessary...
Note: This may take some time, depending on the speed of your Internet connection.
> Pulling Riva Speech Server images.
  > Image nvcr.io/nvidia/riva/riva-speech:1.8.0-beta-server exists. Skipping.
  > Image nvcr.io/nvidia/riva/riva-speech-client:1.8.0-beta exists. Skipping.
  > Image nvcr.io/nvidia/riva/riva-speech:1.8.0-beta-servicemaker exists. Skipping.

Downloading models (RMIRs) from NGC...
Note: this may take some time, depending on the speed of your Internet connection.
To skip this process and use existing RMIRs set the location and corresponding flag in config.sh.

==========================
=== Riva Speech Skills ===
==========================

NVIDIA Release 21.12 (build 30304770)

Copyright (c) 2018-2021, NVIDIA CORPORATION.  All rights reserved.

Various files include modifications (c) NVIDIA CORPORATION.  All rights reserved.
NVIDIA modifications are covered by the license terms that apply to the underlying
project or file.

NOTE: The SHMEM allocation limit is set to the default of 64MB.  This may be
   insufficient for the inference server.  NVIDIA recommends the use of the following flags:
   nvidia-docker run --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 ...

/data/artifacts /opt/riva
Directory rmir_asr_citrinet_1024_en_us_str_v1.8.0-beta already exists, skipping. Use '--force' option to override.
Directory rmir_asr_citrinet_1024_en_us_ofl_v1.8.0-beta already exists, skipping. Use '--force' option to override.
Directory rmir_nlp_punctuation_bert_base_v1.8.0-beta already exists, skipping. Use '--force' option to override.
/opt/riva

Converting RMIRs at riva-model-repo/rmir to Riva Model repository.
+ docker run --init -it --rm --gpus '"device=0"' -v riva-model-repo:/data -e MODEL_DEPLOY_KEY=tlt_encode --name riva-service-maker nvcr.io/nvidia/riva/riva-speech:1.8.0-beta-servicemaker deploy_all_models /data/rmir /data/models

==========================
=== Riva Speech Skills ===
==========================

NVIDIA Release 21.12 (build 30304770)

Copyright (c) 2018-2021, NVIDIA CORPORATION.  All rights reserved.

Various files include modifications (c) NVIDIA CORPORATION.  All rights reserved.
NVIDIA modifications are covered by the license terms that apply to the underlying
project or file.

NOTE: The SHMEM allocation limit is set to the default of 64MB.  This may be
   insufficient for the inference server.  NVIDIA recommends the use of the following flags:
   nvidia-docker run --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 ...

2022-01-17 17:31:16,131 [INFO] Writing Riva model repository to '/data/models'...
2022-01-17 17:31:16,131 [INFO] The riva model repo target directory is /data/models
2022-01-17 17:31:23,471 [INFO] Using tensorrt
2022-01-17 17:31:23,471 [WARNING] /data/models/riva-trt-citrinet-1024-en-US-asr-streaming-am-streaming already exists, skipping deployment.  To force deployment rerun with -f or remove the /data/models/riva-trt-citrinet-1024-en-US-asr-streaming-am-streaming
2022-01-17 17:31:23,471 [WARNING] /data/models/citrinet-1024-en-US-asr-streaming-feature-extractor-streaming already exists, skipping deployment.  To force deployment rerun with -f or remove the /data/models/citrinet-1024-en-US-asr-streaming-feature-extractor-streaming
2022-01-17 17:31:23,471 [WARNING] /data/models/citrinet-1024-en-US-asr-streaming-voice-activity-detector-ctc-streaming already exists, skipping deployment.  To force deployment rerun with -f or remove the /data/models/citrinet-1024-en-US-asr-streaming-voice-activity-detector-ctc-streaming
2022-01-17 17:31:23,471 [WARNING] /data/models/citrinet-1024-en-US-asr-streaming-ctc-decoder-cpu-streaming already exists, skipping deployment.  To force deployment rerun with -f or remove the /data/models/citrinet-1024-en-US-asr-streaming-ctc-decoder-cpu-streaming
2022-01-17 17:31:23,471 [WARNING] /data/models/citrinet-1024-en-US-asr-streaming already exists, skipping deployment.  To force deployment rerun with -f or remove the /data/models/citrinet-1024-en-US-asr-streaming
2022-01-17 17:31:28,617 [INFO] Using tensorrt
2022-01-17 17:31:28,618 [WARNING] /data/models/riva-trt-citrinet-1024-en-US-asr-offline-am-streaming-offline already exists, skipping deployment.  To force deployment rerun with -f or remove the /data/models/riva-trt-citrinet-1024-en-US-asr-offline-am-streaming-offline
2022-01-17 17:31:28,618 [WARNING] /data/models/citrinet-1024-en-US-asr-offline-feature-extractor-streaming-offline already exists, skipping deployment.  To force deployment rerun with -f or remove the /data/models/citrinet-1024-en-US-asr-offline-feature-extractor-streaming-offline
2022-01-17 17:31:28,618 [WARNING] /data/models/citrinet-1024-en-US-asr-offline-voice-activity-detector-ctc-streaming-offline already exists, skipping deployment.  To force deployment rerun with -f or remove the /data/models/citrinet-1024-en-US-asr-offline-voice-activity-detector-ctc-streaming-offline
2022-01-17 17:31:28,618 [WARNING] /data/models/citrinet-1024-en-US-asr-offline-ctc-decoder-cpu-streaming-offline already exists, skipping deployment.  To force deployment rerun with -f or remove the /data/models/citrinet-1024-en-US-asr-offline-ctc-decoder-cpu-streaming-offline
2022-01-17 17:31:28,618 [WARNING] /data/models/citrinet-1024-en-US-asr-offline already exists, skipping deployment.  To force deployment rerun with -f or remove the /data/models/citrinet-1024-en-US-asr-offline
2022-01-17 17:31:33,872 [WARNING] /data/models/riva-trt-riva_punctuation-nn-bert-base-uncased already exists, skipping deployment.  To force deployment rerun with -f or remove the /data/models/riva-trt-riva_punctuation-nn-bert-base-uncased
2022-01-17 17:31:33,872 [WARNING] /data/models/punctuation_tokenizer already exists, skipping deployment.  To force deployment rerun with -f or remove the /data/models/punctuation_tokenizer
2022-01-17 17:31:33,872 [WARNING] /data/models/punctuation_punctuation_postprocessor already exists, skipping deployment.  To force deployment rerun with -f or remove the /data/models/punctuation_punctuation_postprocessor
2022-01-17 17:31:33,872 [WARNING] /data/models/riva_punctuation already exists, skipping deployment.  To force deployment rerun with -f or remove the /data/models/riva_punctuation
2022-01-17 17:32:04,025 [INFO] Using tensorrt
2022-01-17 17:32:04,046 [INFO] Extract_binaries for nn -> /data/models/riva-trt-citrinet-v3-stock-1024-english-asr-streaming-am-streaming/1
2022-01-17 17:32:04,046 [INFO] extracting {'onnx': ('nemo.collections.asr.models.ctc_bpe_models.EncDecCTCModelBPE', 'model_graph.onnx')} -> /data/models/riva-trt-citrinet-v3-stock-1024-english-asr-streaming-am-streaming/1
2022-01-17 17:32:11,578 [INFO] Printing copied artifacts:
2022-01-17 17:32:11,578 [INFO] {'onnx': '/data/models/riva-trt-citrinet-v3-stock-1024-english-asr-streaming-am-streaming/1/model_graph.onnx'}
2022-01-17 17:32:11,578 [INFO] Building TRT engine from ONNX file
[W] colored module is not installed, will not use colors when logging. To enable colors, please install the colored module: python3 -m pip install colored
[W] 'Shape tensor cast elision' routine failed with: None
[libprotobuf WARNING google/protobuf/io/coded_stream.cc:604] Reading dangerously large protocol message.  If the message turns out to be larger than 2147483647 bytes, parsing will be halted for security reasons.  To increase the limit (or to disable these warnings), see CodedInputStream::SetTotalBytesLimit() in google/protobuf/io/coded_stream.h.
[libprotobuf WARNING google/protobuf/io/coded_stream.cc:81] The total number of bytes read was 564431790
[TensorRT] WARNING: onnx2trt_utils.cpp:362: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[TensorRT] WARNING: Detected invalid timing cache, setup a local cache instead
2022-01-17 17:34:39,407 [INFO] Extract_binaries for featurizer -> /data/models/citrinet-v3-stock-1024-english-asr-streaming-feature-extractor-streaming/1
2022-01-17 17:34:39,412 [INFO] Extract_binaries for vad -> /data/models/citrinet-v3-stock-1024-english-asr-streaming-voice-activity-detector-ctc-streaming/1
2022-01-17 17:34:39,412 [INFO] extracting {'vocab_file': '/tmp/tmp8hinuqiw/riva_decoder_vocabulary.txt'} -> /data/models/citrinet-v3-stock-1024-english-asr-streaming-voice-activity-detector-ctc-streaming/1
2022-01-17 17:34:39,414 [INFO] Extract_binaries for lm_decoder -> /data/models/citrinet-v3-stock-1024-english-asr-streaming-ctc-decoder-cpu-streaming/1
2022-01-17 17:34:39,414 [INFO] extracting {'vocab_file': '/tmp/tmp8hinuqiw/riva_decoder_vocabulary.txt', 'decoding_language_model_binary': '/root/Downloads/speechtotext_en_us_lm_vdeployable_v1-1.0/mixed-lower.binary', 'decoding_vocab': '/root/Downloads/speechtotext_en_us_lm_vdeployable_v1-1.0/words.mixed_lm.txt', 'tokenizer_model': ('nemo.collections.asr.models.ctc_bpe_models.EncDecCTCModelBPE', '498056ba420d4bb3831ad557fba06032_tokenizer.model')} -> /data/models/citrinet-v3-stock-1024-english-asr-streaming-ctc-decoder-cpu-streaming/1
2022-01-17 17:34:40,804 [INFO] {'vocab_file': '/data/models/citrinet-v3-stock-1024-english-asr-streaming-ctc-decoder-cpu-streaming/1/riva_decoder_vocabulary.txt', 'decoding_language_model_binary': '/data/models/citrinet-v3-stock-1024-english-asr-streaming-ctc-decoder-cpu-streaming/1/mixed-lower.binary', 'decoding_vocab': '/data/models/citrinet-v3-stock-1024-english-asr-streaming-ctc-decoder-cpu-streaming/1/words.mixed_lm.txt', 'tokenizer_model': '/data/models/citrinet-v3-stock-1024-english-asr-streaming-ctc-decoder-cpu-streaming/1/498056ba420d4bb3831ad557fba06032_tokenizer.model'}
2022-01-17 17:34:40,804 [INFO] Model config has vocab file and tokenizer specified. Will create subword lexicon file from  vocab_file /data/models/citrinet-v3-stock-1024-english-asr-streaming-ctc-decoder-cpu-streaming/1/words.mixed_lm.txt and tokenizer model /data/models/citrinet-v3-stock-1024-english-asr-streaming-ctc-decoder-cpu-streaming/1/498056ba420d4bb3831ad557fba06032_tokenizer.model
2022-01-17 17:34:40,955 [INFO] processed 10000 lines
2022-01-17 17:34:41,108 [INFO] processed 20000 lines
2022-01-17 17:34:41,260 [INFO] processed 30000 lines
2022-01-17 17:34:41,414 [INFO] processed 40000 lines
2022-01-17 17:34:41,568 [INFO] processed 50000 lines
2022-01-17 17:34:41,720 [INFO] processed 60000 lines
2022-01-17 17:34:41,873 [INFO] processed 70000 lines
2022-01-17 17:34:42,024 [INFO] processed 80000 lines
2022-01-17 17:34:42,176 [INFO] processed 90000 lines
2022-01-17 17:34:42,331 [INFO] processed 100000 lines
2022-01-17 17:34:42,487 [INFO] processed 110000 lines
2022-01-17 17:34:42,642 [INFO] processed 120000 lines
2022-01-17 17:34:42,801 [INFO] processed 130000 lines
2022-01-17 17:34:42,957 [INFO] processed 140000 lines
2022-01-17 17:34:43,115 [INFO] processed 150000 lines
2022-01-17 17:34:43,273 [INFO] processed 160000 lines
2022-01-17 17:34:43,431 [INFO] processed 170000 lines
2022-01-17 17:34:43,588 [INFO] processed 180000 lines
2022-01-17 17:34:43,747 [INFO] processed 190000 lines
2022-01-17 17:34:43,904 [INFO] processed 200000 lines
2022-01-17 17:34:44,065 [INFO] processed 210000 lines
2022-01-17 17:34:44,222 [INFO] processed 220000 lines
2022-01-17 17:34:44,245 [INFO] skipped 0 empty lines
2022-01-17 17:34:44,245 [INFO] filtered 0 lines
2022-01-17 17:34:44,248 [INFO] Extract_binaries for self -> /data/models/citrinet-v3-stock-1024-english-asr-streaming/1
+ echo

+ echo 'Riva initialization complete. Run ./riva_start.sh to launch services.'
Riva initialization complete. Run ./riva_start.sh to launch services.

When I do a riva_start, it fails in the following logs:

==========================
=== Riva Speech Skills ===
==========================

NVIDIA Release 21.12 (build 30304767)

Copyright (c) 2018-2021, NVIDIA CORPORATION.  All rights reserved.

Various files include modifications (c) NVIDIA CORPORATION.  All rights reserved.
NVIDIA modifications are covered by the license terms that apply to the underlying
project or file.

NOTE: The SHMEM allocation limit is set to the default of 64MB.  This may be
   insufficient for the inference server.  NVIDIA recommends the use of the following flags:
   nvidia-docker run --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 ...

  > Riva waiting for Triton server to load all models...retrying in 1 second
I0117 17:37:07.136213 70 metrics.cc:290] Collecting metrics for GPU 0: Tesla T4
I0117 17:37:07.177269 70 onnxruntime.cc:1970] TRITONBACKEND_Initialize: onnxruntime
I0117 17:37:07.177296 70 onnxruntime.cc:1980] Triton TRITONBACKEND API version: 1.4
I0117 17:37:07.177304 70 onnxruntime.cc:1986] 'onnxruntime' TRITONBACKEND API version: 1.4
I0117 17:37:07.370592 70 pinned_memory_manager.cc:240] Pinned memory pool is created at '0x7f1048000000' with size 268435456
I0117 17:37:07.371749 70 cuda_memory_manager.cc:105] CUDA memory pool is created on device 0 with size 1000000000
I0117 17:37:07.393491 70 model_repository_manager.cc:1045] loading: citrinet-1024-en-US-asr-offline-ctc-decoder-cpu-streaming-offline:1
  > Riva waiting for Triton server to load all models...retrying in 1 second
I0117 17:37:07.493756 70 model_repository_manager.cc:1045] loading: citrinet-1024-en-US-asr-offline-feature-extractor-streaming-offline:1
I0117 17:37:07.528150 70 ctc-decoder-library.cc:20] TRITONBACKEND_ModelInitialize: citrinet-1024-en-US-asr-offline-ctc-decoder-cpu-streaming-offline (version 1)
W:parameter_parser.cc:118: Parameter forerunner_start_offset_ms could not be set from parameters
W:parameter_parser.cc:119: Default value will be used
W:parameter_parser.cc:118: Parameter forerunner_start_offset_ms could not be set from parameters
W:parameter_parser.cc:119: Default value will be used
W:parameter_parser.cc:118: Parameter max_num_slots could not be set from parameters
W:parameter_parser.cc:119: Default value will be used
I0117 17:37:07.532198 70 backend_model.cc:255] model configuration:
{
    "name": "citrinet-1024-en-US-asr-offline-ctc-decoder-cpu-streaming-offline",
    "platform": "",
    "backend": "riva_asr_decoder",
    "version_policy": {
        "latest": {
            "num_versions": 1
        }
    },
    "max_batch_size": 128,
    "input": [
        {
            "name": "CLASS_LOGITS",
            "data_type": "TYPE_FP32",
            "format": "FORMAT_NONE",
            "dims": [
                -1,
                1025
            ],
            "is_shape_tensor": false,
            "allow_ragged_batch": false
        },
        {
            "name": "END_FLAG",
            "data_type": "TYPE_UINT32",
            "format": "FORMAT_NONE",
            "dims": [
                1
            ],
            "is_shape_tensor": false,
            "allow_ragged_batch": false
        },
        {
            "name": "SEGMENTS_START_END",
            "data_type": "TYPE_INT32",
            "format": "FORMAT_NONE",
            "dims": [
                -1,
                2
            ],
            "is_shape_tensor": false,
            "allow_ragged_batch": false
        },
        {
            "name": "CUSTOM_CONFIGURATION",
            "data_type": "TYPE_STRING",
            "format": "FORMAT_NONE",
            "dims": [
                -1,
                2
            ],
            "is_shape_tensor": false,
            "allow_ragged_batch": false
        }
    ],
    "output": [
        {
            "name": "FINAL_TRANSCRIPTS",
            "data_type": "TYPE_STRING",
            "dims": [
                -1
            ],
            "label_filename": "",
            "is_shape_tensor": false
        },
        {
            "name": "FINAL_TRANSCRIPTS_SCORE",
            "data_type": "TYPE_FP32",
            "dims": [
                -1
            ],
            "label_filename": "",
            "is_shape_tensor": false
        },
        {
            "name": "FINAL_WORDS_START_END",
            "data_type": "TYPE_INT32",
            "dims": [
                -1,
                2
            ],
            "label_filename": "",
            "is_shape_tensor": false
        },
        {
            "name": "PARTIAL_TRANSCRIPTS",
            "data_type": "TYPE_STRING",
            "dims": [
                -1
            ],
            "label_filename": "",
            "is_shape_tensor": false
        },
        {
            "name": "PARTIAL_TRANSCRIPTS_STABILITY",
            "data_type": "TYPE_FP32",
            "dims": [
                -1
            ],
            "label_filename": "",
            "is_shape_tensor": false
        },
        {
            "name": "PARTIAL_WORDS_START_END",
            "data_type": "TYPE_INT32",
            "dims": [
                -1,
                2
            ],
            "label_filename": "",
            "is_shape_tensor": false
        }
    ],
    "batch_input": [],
    "batch_output": [],
    "optimization": {
        "priority": "PRIORITY_DEFAULT",
        "cuda": {
            "graphs": false,
            "busy_wait_events": false,
            "graph_spec": [],
            "output_copy_stream": true
        },
        "input_pinned_memory": {
            "enable": true
        },
        "output_pinned_memory": {
            "enable": true
        },
        "gather_kernel_buffer_threshold": 0,
        "eager_batching": false
    },
    "sequence_batching": {
        "oldest": {
            "max_candidate_sequences": 128,
            "preferred_batch_size": [
                32,
                64
            ],
            "max_queue_delay_microseconds": 1000
        },
        "max_sequence_idle_microseconds": 60000000,
        "control_input": [
            {
                "name": "START",
                "control": [
                    {
                        "kind": "CONTROL_SEQUENCE_START",
                        "int32_false_true": [
                            0,
                            1
                        ],
                        "fp32_false_true": [],
                        "data_type": "TYPE_INVALID"
                    }
                ]
            },
            {
                "name": "READY",
                "control": [
                    {
                        "kind": "CONTROL_SEQUENCE_READY",
                        "int32_false_true": [
                            0,
                            1
                        ],
                        "fp32_false_true": [],
                        "data_type": "TYPE_INVALID"
                    }
                ]
            },
            {
                "name": "END",
                "control": [
                    {
                        "kind": "CONTROL_SEQUENCE_END",
                        "int32_false_true": [
                            0,
                            1
                        ],
                        "fp32_false_true": [],
                        "data_type": "TYPE_INVALID"
                    }
                ]
            },
            {
                "name": "CORRID",
                "control": [
                    {
                        "kind": "CONTROL_SEQUENCE_CORRID",
                        "int32_false_true": [],
                        "fp32_false_true": [],
                        "data_type": "TYPE_UINT64"
                    }
                ]
            }
        ]
    },
    "instance_group": [
        {
            "name": "citrinet-1024-en-US-asr-offline-ctc-decoder-cpu-streaming-offline_0",
            "kind": "KIND_CPU",
            "count": 1,
            "gpus": [],
            "secondary_devices": [],
            "profile": [],
            "passive": false,
            "host_policy": ""
        }
    ],
    "default_model_filename": "",
    "cc_model_filenames": {},
    "metric_tags": {},
    "parameters": {
        "use_vad": {
            "string_value": "True"
        },
        "lm_weight": {
            "string_value": "0.2"
        },
        "blank_token": {
            "string_value": "#"
        },
        "vocab_file": {
            "string_value": "/data/models/citrinet-1024-en-US-asr-offline-ctc-decoder-cpu-streaming-offline/1/riva_decoder_vocabulary.txt"
        },
        "ms_per_timestep": {
            "string_value": "80"
        },
        "use_subword": {
            "string_value": "True"
        },
        "streaming": {
            "string_value": "True"
        },
        "beam_size": {
            "string_value": "16"
        },
        "right_padding_size": {
            "string_value": "0.0"
        },
        "beam_size_token": {
            "string_value": "16"
        },
        "sil_token": {
            "string_value": "▁"
        },
        "beam_threshold": {
            "string_value": "20.0"
        },
        "language_model_file": {
            "string_value": "/data/models/citrinet-1024-en-US-asr-offline-ctc-decoder-cpu-streaming-offline/1/jarvis_asr_train_datasets_noSpgi_noLS_gt_3gram.binary"
        },
        "max_execution_batch_size": {
            "string_value": "1024"
        },
        "forerunner_use_lm": {
            "string_value": "true"
        },
        "forerunner_beam_size_token": {
            "string_value": "8"
        },
        "forerunner_beam_threshold": {
            "string_value": "10.0"
        },
        "asr_model_delay": {
            "string_value": "-1"
        },
        "decoder_num_worker_threads": {
            "string_value": "-1"
        },
        "word_insertion_score": {
            "string_value": "0.2"
        },
        "left_padding_size": {
            "string_value": "0.0"
        },
        "decoder_type": {
            "string_value": "flashlight"
        },
        "compute_timestamps": {
            "string_value": "True"
        },
        "forerunner_beam_size": {
            "string_value": "8"
        },
        "chunk_size": {
            "string_value": "900.0"
        },
        "max_supported_transcripts": {
            "string_value": "1"
        },
        "lexicon_file": {
            "string_value": "/data/models/citrinet-1024-en-US-asr-offline-ctc-decoder-cpu-streaming-offline/1/lexicon.txt"
        },
        "smearing_mode": {
            "string_value": "max"
        }
    },
    "model_warmup": [],
    "model_transaction_policy": {
        "decoupled": false
    }
}
I0117 17:37:07.532280 70 ctc-decoder-library.cc:23] TRITONBACKEND_ModelInstanceInitialize: citrinet-1024-en-US-asr-offline-ctc-decoder-cpu-streaming-offline_0 (device 0)
I0117 17:37:07.594105 70 model_repository_manager.cc:1045] loading: citrinet-1024-en-US-asr-offline-voice-activity-detector-ctc-streaming-offline:1
I0117 17:37:07.694390 70 model_repository_manager.cc:1045] loading: citrinet-1024-en-US-asr-streaming-ctc-decoder-cpu-streaming:1
I0117 17:37:07.794736 70 model_repository_manager.cc:1045] loading: citrinet-1024-en-US-asr-streaming-feature-extractor-streaming:1
I0117 17:37:07.894997 70 model_repository_manager.cc:1045] loading: citrinet-1024-en-US-asr-streaming-voice-activity-detector-ctc-streaming:1
I0117 17:37:07.995284 70 model_repository_manager.cc:1045] loading: citrinet-v3-stock-1024-english-asr-streaming-ctc-decoder-cpu-streaming:1
I0117 17:37:08.095555 70 model_repository_manager.cc:1045] loading: citrinet-v3-stock-1024-english-asr-streaming-feature-extractor-streaming:1
I0117 17:37:08.195899 70 model_repository_manager.cc:1045] loading: citrinet-v3-stock-1024-english-asr-streaming-voice-activity-detector-ctc-streaming:1
I0117 17:37:08.296171 70 model_repository_manager.cc:1045] loading: punctuation_punctuation_postprocessor:1
I0117 17:37:08.396452 70 model_repository_manager.cc:1045] loading: punctuation_tokenizer:1
  > Riva waiting for Triton server to load all models...retrying in 1 second
I0117 17:37:08.496721 70 model_repository_manager.cc:1045] loading: riva-trt-citrinet-1024-en-US-asr-offline-am-streaming-offline:1
I0117 17:37:08.597012 70 model_repository_manager.cc:1045] loading: riva-trt-citrinet-1024-en-US-asr-streaming-am-streaming:1
I0117 17:37:08.697360 70 model_repository_manager.cc:1045] loading: riva-trt-citrinet-v3-stock-1024-english-asr-streaming-am-streaming:1
I0117 17:37:08.797717 70 model_repository_manager.cc:1045] loading: riva-trt-riva_punctuation-nn-bert-base-uncased:1
I0117 17:37:08.889152 70 model_repository_manager.cc:1212] successfully loaded 'citrinet-1024-en-US-asr-offline-ctc-decoder-cpu-streaming-offline' version 1
I0117 17:37:08.895136 70 feature-extractor.cc:402] TRITONBACKEND_ModelInitialize: citrinet-1024-en-US-asr-offline-feature-extractor-streaming-offline (version 1)
I0117 17:37:08.930179 70 backend_model.cc:255] model configuration:
{
    "name": "citrinet-1024-en-US-asr-offline-feature-extractor-streaming-offline",
    "platform": "",
    "backend": "riva_asr_features",
    "version_policy": {
        "latest": {
            "num_versions": 1
        }
    },
    "max_batch_size": 1,
    "input": [
        {
            "name": "AUDIO_SIGNAL",
            "data_type": "TYPE_FP32",
            "format": "FORMAT_NONE",
            "dims": [
                -1
            ],
            "is_shape_tensor": false,
            "allow_ragged_batch": false
        },
        {
            "name": "SAMPLE_RATE",
            "data_type": "TYPE_UINT32",
            "format": "FORMAT_NONE",
            "dims": [
                1
            ],
            "is_shape_tensor": false,
            "allow_ragged_batch": false
        }
    ],
    "output": [
        {
            "name": "AUDIO_FEATURES",
            "data_type": "TYPE_FP32",
            "dims": [
                80,
                -1
            ],
            "label_filename": "",
            "is_shape_tensor": false
        },
        {
            "name": "AUDIO_PROCESSED",
            "data_type": "TYPE_FP32",
            "dims": [
                1
            ],
            "label_filename": "",
            "is_shape_tensor": false
        }
    ],
    "batch_input": [],
    "batch_output": [],
    "optimization": {
        "priority": "PRIORITY_DEFAULT",
        "cuda": {
            "graphs": false,
            "busy_wait_events": false,
            "graph_spec": [],
            "output_copy_stream": true
        },
        "input_pinned_memory": {
            "enable": true
        },
        "output_pinned_memory": {
            "enable": true
        },
        "gather_kernel_buffer_threshold": 0,
        "eager_batching": false
    },
    "sequence_batching": {
        "oldest": {
            "max_candidate_sequences": 1,
            "preferred_batch_size": [
                1
            ],
            "max_queue_delay_microseconds": 1000
        },
        "max_sequence_idle_microseconds": 60000000,
        "control_input": [
            {
                "name": "START",
                "control": [
                    {
                        "kind": "CONTROL_SEQUENCE_START",
                        "int32_false_true": [
                            0,
                            1
                        ],
                        "fp32_false_true": [],
                        "data_type": "TYPE_INVALID"
                    }
                ]
            },
            {
                "name": "READY",
                "control": [
                    {
                        "kind": "CONTROL_SEQUENCE_READY",
                        "int32_false_true": [
                            0,
                            1
                        ],
                        "fp32_false_true": [],
                        "data_type": "TYPE_INVALID"
                    }
                ]
            },
            {
                "name": "END",
                "control": [
                    {
                        "kind": "CONTROL_SEQUENCE_END",
                        "int32_false_true": [
                            0,
                            1
                        ],
                        "fp32_false_true": [],
                        "data_type": "TYPE_INVALID"
                    }
                ]
            },
            {
                "name": "CORRID",
                "control": [
                    {
                        "kind": "CONTROL_SEQUENCE_CORRID",
                        "int32_false_true": [],
                        "fp32_false_true": [],
                        "data_type": "TYPE_UINT64"
                    }
                ]
            }
        ]
    },
    "instance_group": [
        {
            "name": "citrinet-1024-en-US-asr-offline-feature-extractor-streaming-offline_0",
            "kind": "KIND_GPU",
            "count": 1,
            "gpus": [
                0
            ],
            "secondary_devices": [],
            "profile": [],
            "passive": false,
            "host_policy": ""
        }
    ],
    "default_model_filename": "",
    "cc_model_filenames": {},
    "metric_tags": {},
    "parameters": {
        "precalc_norm_params": {
            "string_value": "False"
        },
        "dither": {
            "string_value": "0.0"
        },
        "norm_per_feature": {
            "string_value": "True"
        },
        "mean": {
            "string_value": "-11.4412,  -9.9334,  -9.1292,  -9.0365,  -9.2804,  -9.5643,  -9.7342, -9.6925,  -9.6333,  -9.2808,  -9.1887,  -9.1422,  -9.1397,  -9.2028, -9.2749,  -9.4776,  -9.9185, -10.1557, -10.3800, -10.5067, -10.3190, -10.4728, -10.5529, -10.6402, -10.6440, -10.5113, -10.7395, -10.7870, -10.6074, -10.5033, -10.8278, -10.6384, -10.8481, -10.6875, -10.5454, -10.4747, -10.5165, -10.4930, -10.3413, -10.3472, -10.3735, -10.6830, -10.8813, -10.6338, -10.3856, -10.7727, -10.8957, -10.8068, -10.7373, -10.6108, -10.3405, -10.2889, -10.3922, -10.4946, -10.3367, -10.4164, -10.9949, -10.7196, -10.3971, -10.1734,  -9.9257,  -9.6557,  -9.1761, -9.6653,  -9.7876,  -9.7230,  -9.7792,  -9.7056,  -9.2702,  -9.4650, -9.2755,  -9.1369,  -9.1174,  -8.9197,  -8.5394,  -8.2614,  -8.1353, -8.1422,  -8.3430,  -8.6655"
        },
        "stddev": {
            "string_value": "2.2668, 3.1642, 3.7079, 3.7642, 3.5349, 3.5901, 3.7640, 3.8424, 4.0145, 4.1475, 4.0457, 3.9048, 3.7709, 3.6117, 3.3188, 3.1489, 3.0615, 3.0362, 2.9929, 3.0500, 3.0341, 3.0484, 3.0103, 2.9474, 2.9128, 2.8669, 2.8332, 2.9411, 3.0378, 3.0712, 3.0190, 2.9992, 3.0124, 3.0024, 3.0275, 3.0870, 3.0656, 3.0142, 3.0493, 3.1373, 3.1135, 3.0675, 2.8828, 2.7018, 2.6296, 2.8826, 2.9325, 2.9288, 2.9271, 2.9890, 3.0137, 2.9855, 3.0839, 2.9319, 2.3512, 2.3795, 2.6191, 2.7555, 2.9326, 2.9931, 3.1543, 3.0855, 2.6820, 3.0566, 3.1272, 3.1663, 3.1836, 3.0018, 2.9089, 3.1727, 3.1626, 3.1086, 2.9804, 3.1107, 3.2998, 3.3697, 3.3716, 3.2487, 3.1597, 3.1181"
        },
        "chunk_size": {
            "string_value": "900.0"
        },
        "max_execution_batch_size": {
            "string_value": "1"
        },
        "sample_rate": {
            "string_value": "16000"
        },
        "num_features": {
            "string_value": "80"
        },
        "window_size": {
            "string_value": "0.025"
        },
        "window_stride": {
            "string_value": "0.01"
        },
        "streaming": {
            "string_value": "False"
        },
        "transpose": {
            "string_value": "False"
        },
        "stddev_floor": {
            "string_value": "1e-05"
        },
        "left_padding_size": {
            "string_value": "0.0"
        },
        "right_padding_size": {
            "string_value": "0.0"
        },
        "gain": {
            "string_value": "1.0"
        },
        "use_utterance_norm_params": {
            "string_value": "False"
        },
        "precalc_norm_time_steps": {
            "string_value": "0"
        }
    },
    "model_warmup": [],
    "model_transaction_policy": {
        "decoupled": false
    }
}
I0117 17:37:08.935731 70 vad_library.cc:18] TRITONBACKEND_ModelInitialize: citrinet-1024-en-US-asr-offline-voice-activity-detector-ctc-streaming-offline (version 1)
W:parameter_parser.cc:118: Parameter max_execution_batch_size could not be set from parameters
W:parameter_parser.cc:119: Default value will be used
W:parameter_parser.cc:118: Parameter max_execution_batch_size could not be set from parameters
W:parameter_parser.cc:119: Default value will be used
I0117 17:37:08.938619 70 backend_model.cc:255] model configuration:
{
    "name": "citrinet-1024-en-US-asr-offline-voice-activity-detector-ctc-streaming-offline",
    "platform": "",
    "backend": "riva_asr_vad",
    "version_policy": {
        "latest": {
            "num_versions": 1
        }
    },
    "max_batch_size": 2048,
    "input": [
        {
            "name": "CLASS_LOGITS",
            "data_type": "TYPE_FP32",
            "format": "FORMAT_NONE",
            "dims": [
                -1,
                1025
            ],
            "is_shape_tensor": false,
            "allow_ragged_batch": false
        }
    ],
    "output": [
        {
            "name": "SEGMENTS_START_END",
            "data_type": "TYPE_INT32",
            "dims": [
                -1,
                2
            ],
            "label_filename": "",
            "is_shape_tensor": false
        }
    ],
    "batch_input": [],
    "batch_output": [],
    "optimization": {
        "priority": "PRIORITY_DEFAULT",
        "cuda": {
            "graphs": false,
            "busy_wait_events": false,
            "graph_spec": [],
            "output_copy_stream": true
        },
        "input_pinned_memory": {
            "enable": true
        },
        "output_pinned_memory": {
            "enable": true
        },
        "gather_kernel_buffer_threshold": 0,
        "eager_batching": false
    },
    "sequence_batching": {
        "max_sequence_idle_microseconds": 60000000,
        "control_input": [
            {
                "name": "START",
                "control": [
                    {
                        "kind": "CONTROL_SEQUENCE_START",
                        "int32_false_true": [
                            0,
                            1
                        ],
                        "fp32_false_true": [],
                        "data_type": "TYPE_INVALID"
                    }
                ]
            },
            {
                "name": "READY",
                "control": [
                    {
                        "kind": "CONTROL_SEQUENCE_READY",
                        "int32_false_true": [
                            0,
                            1
                        ],
                        "fp32_false_true": [],
                        "data_type": "TYPE_INVALID"
                    }
                ]
            }
        ]
    },
    "instance_group": [
        {
            "name": "citrinet-1024-en-US-asr-offline-voice-activity-detector-ctc-streaming-offline_0",
            "kind": "KIND_CPU",
            "count": 1,
            "gpus": [],
            "secondary_devices": [],
            "profile": [],
            "passive": false,
            "host_policy": ""
        }
    ],
    "default_model_filename": "",
    "cc_model_filenames": {},
    "metric_tags": {},
    "parameters": {
        "residue_blanks_at_start": {
            "string_value": "0"
        },
        "ms_per_timestep": {
            "string_value": "80"
        },
        "streaming": {
            "string_value": "True"
        },
        "use_subword": {
            "string_value": "True"
        },
        "residue_blanks_at_end": {
            "string_value": "0"
        },
        "vad_stop_history": {
            "string_value": "800"
        },
        "vad_start_history": {
            "string_value": "300"
        },
        "chunk_size": {
            "string_value": "900.0"
        },
        "vad_start_th": {
            "string_value": "0.2"
        },
        "vad_stop_th": {
            "string_value": "0.98"
        },
        "vad_type": {
            "string_value": "ctc-vad"
        },
        "vocab_file": {
            "string_value": "/data/models/citrinet-1024-en-US-asr-offline-voice-activity-detector-ctc-streaming-offline/1/riva_decoder_vocabulary.txt"
        }
    },
    "model_warmup": [],
    "model_transaction_policy": {
        "decoupled": false
    }
}
I0117 17:37:08.938700 70 feature-extractor.cc:402] TRITONBACKEND_ModelInitialize: citrinet-1024-en-US-asr-streaming-feature-extractor-streaming (version 1)
I0117 17:37:08.940068 70 backend_model.cc:255] model configuration:
{
    "name": "citrinet-1024-en-US-asr-streaming-feature-extractor-streaming",
    "platform": "",
    "backend": "riva_asr_features",
    "version_policy": {
        "latest": {
            "num_versions": 1
        }
    },
    "max_batch_size": 2048,
    "input": [
        {
            "name": "AUDIO_SIGNAL",
            "data_type": "TYPE_FP32",
            "format": "FORMAT_NONE",
            "dims": [
                -1
            ],
            "is_shape_tensor": false,
            "allow_ragged_batch": false
        },
        {
            "name": "SAMPLE_RATE",
            "data_type": "TYPE_UINT32",
            "format": "FORMAT_NONE",
            "dims": [
                1
            ],
            "is_shape_tensor": false,
            "allow_ragged_batch": false
        }
    ],
    "output": [
        {
            "name": "AUDIO_FEATURES",
            "data_type": "TYPE_FP32",
            "dims": [
                80,
                -1
            ],
            "label_filename": "",
            "is_shape_tensor": false
        },
        {
            "name": "AUDIO_PROCESSED",
            "data_type": "TYPE_FP32",
            "dims": [
                1
            ],
            "label_filename": "",
            "is_shape_tensor": false
        }
    ],
    "batch_input": [],
    "batch_output": [],
    "optimization": {
        "priority": "PRIORITY_DEFAULT",
        "cuda": {
            "graphs": false,
            "busy_wait_events": false,
            "graph_spec": [],
            "output_copy_stream": true
        },
        "input_pinned_memory": {
            "enable": true
        },
        "output_pinned_memory": {
            "enable": true
        },
        "gather_kernel_buffer_threshold": 0,
        "eager_batching": false
    },
    "sequence_batching": {
        "oldest": {
            "max_candidate_sequences": 2048,
            "preferred_batch_size": [
                256,
                512
            ],
            "max_queue_delay_microseconds": 1000
        },
        "max_sequence_idle_microseconds": 60000000,
        "control_input": [
            {
                "name": "START",
                "control": [
                    {
                        "kind": "CONTROL_SEQUENCE_START",
                        "int32_false_true": [
                            0,
                            1
                        ],
                        "fp32_false_true": [],
                        "data_type": "TYPE_INVALID"
                    }
                ]
            },
            {
                "name": "READY",
                "control": [
                    {
                        "kind": "CONTROL_SEQUENCE_READY",
                        "int32_false_true": [
                            0,
                            1
                        ],
                        "fp32_false_true": [],
                        "data_type": "TYPE_INVALID"
                    }
                ]
            },
            {
                "name": "END",
                "control": [
                    {
                        "kind": "CONTROL_SEQUENCE_END",
                        "int32_false_true": [
                            0,
                            1
                        ],
                        "fp32_false_true": [],
                        "data_type": "TYPE_INVALID"
                    }
                ]
            },
            {
                "name": "CORRID",
                "control": [
                    {
                        "kind": "CONTROL_SEQUENCE_CORRID",
                        "int32_false_true": [],
                        "fp32_false_true": [],
                        "data_type": "TYPE_UINT64"
                    }
                ]
            }
        ]
    },
    "instance_group": [
        {
            "name": "citrinet-1024-en-US-asr-streaming-feature-extractor-streaming_0",
            "kind": "KIND_GPU",
            "count": 1,
            "gpus": [
                0
            ],
            "secondary_devices": [],
            "profile": [],
            "passive": false,
            "host_policy": ""
        }
    ],
    "default_model_filename": "",
    "cc_model_filenames": {},
    "metric_tags": {},
    "parameters": {
        "max_execution_batch_size": {
            "string_value": "1024"
        },
        "sample_rate": {
            "string_value": "16000"
        },
        "num_features": {
            "string_value": "80"
        },
        "window_size": {
            "string_value": "0.025"
        },
        "window_stride": {
            "string_value": "0.01"
        },
        "streaming": {
            "string_value": "True"
        },
        "transpose": {
            "string_value": "False"
        },
        "stddev_floor": {
            "string_value": "1e-05"
        },
        "left_padding_size": {
            "string_value": "1.92"
        },
        "right_padding_size": {
            "string_value": "1.92"
        },
        "gain": {
            "string_value": "1.0"
        },
        "precalc_norm_time_steps": {
            "string_value": "0"
        },
        "use_utterance_norm_params": {
            "string_value": "False"
        },
        "precalc_norm_params": {
            "string_value": "False"
        },
        "dither": {
            "string_value": "1e-05"
        },
        "norm_per_feature": {
            "string_value": "True"
        },
        "mean": {
            "string_value": "-11.4412,  -9.9334,  -9.1292,  -9.0365,  -9.2804,  -9.5643,  -9.7342, -9.6925,  -9.6333,  -9.2808,  -9.1887,  -9.1422,  -9.1397,  -9.2028, -9.2749,  -9.4776,  -9.9185, -10.1557, -10.3800, -10.5067, -10.3190, -10.4728, -10.5529, -10.6402, -10.6440, -10.5113, -10.7395, -10.7870, -10.6074, -10.5033, -10.8278, -10.6384, -10.8481, -10.6875, -10.5454, -10.4747, -10.5165, -10.4930, -10.3413, -10.3472, -10.3735, -10.6830, -10.8813, -10.6338, -10.3856, -10.7727, -10.8957, -10.8068, -10.7373, -10.6108, -10.3405, -10.2889, -10.3922, -10.4946, -10.3367, -10.4164, -10.9949, -10.7196, -10.3971, -10.1734,  -9.9257,  -9.6557,  -9.1761, -9.6653,  -9.7876,  -9.7230,  -9.7792,  -9.7056,  -9.2702,  -9.4650, -9.2755,  -9.1369,  -9.1174,  -8.9197,  -8.5394,  -8.2614,  -8.1353, -8.1422,  -8.3430,  -8.6655"
        },
        "stddev": {
            "string_value": "2.2668, 3.1642, 3.7079, 3.7642, 3.5349, 3.5901, 3.7640, 3.8424, 4.0145, 4.1475, 4.0457, 3.9048, 3.7709, 3.6117, 3.3188, 3.1489, 3.0615, 3.0362, 2.9929, 3.0500, 3.0341, 3.0484, 3.0103, 2.9474, 2.9128, 2.8669, 2.8332, 2.9411, 3.0378, 3.0712, 3.0190, 2.9992, 3.0124, 3.0024, 3.0275, 3.0870, 3.0656, 3.0142, 3.0493, 3.1373, 3.1135, 3.0675, 2.8828, 2.7018, 2.6296, 2.8826, 2.9325, 2.9288, 2.9271, 2.9890, 3.0137, 2.9855, 3.0839, 2.9319, 2.3512, 2.3795, 2.6191, 2.7555, 2.9326, 2.9931, 3.1543, 3.0855, 2.6820, 3.0566, 3.1272, 3.1663, 3.1836, 3.0018, 2.9089, 3.1727, 3.1626, 3.1086, 2.9804, 3.1107, 3.2998, 3.3697, 3.3716, 3.2487, 3.1597, 3.1181"
        },
        "chunk_size": {
            "string_value": "0.16"
        }
    },
    "model_warmup": [],
    "model_transaction_policy": {
        "decoupled": false
    }
}
I0117 17:37:08.940129 70 ctc-decoder-library.cc:20] TRITONBACKEND_ModelInitialize: citrinet-1024-en-US-asr-streaming-ctc-decoder-cpu-streaming (version 1)
W:parameter_parser.cc:118: Parameter forerunner_start_offset_ms could not be set from parameters
W:parameter_parser.cc:119: Default value will be used
W:parameter_parser.cc:118: Parameter forerunner_start_offset_ms could not be set from parameters
W:parameter_parser.cc:119: Default value will be used
W:parameter_parser.cc:118: Parameter max_num_slots could not be set from parameters
W:parameter_parser.cc:119: Default value will be used
I0117 17:37:08.943110 70 backend_model.cc:255] model configuration:
{
    "name": "citrinet-1024-en-US-asr-streaming-ctc-decoder-cpu-streaming",
    "platform": "",
    "backend": "riva_asr_decoder",
    "version_policy": {
        "latest": {
            "num_versions": 1
        }
    },
    "max_batch_size": 2048,
    "input": [
        {
            "name": "CLASS_LOGITS",
            "data_type": "TYPE_FP32",
            "format": "FORMAT_NONE",
            "dims": [
                -1,
                1025
            ],
            "is_shape_tensor": false,
            "allow_ragged_batch": false
        },
        {
            "name": "END_FLAG",
            "data_type": "TYPE_UINT32",
            "format": "FORMAT_NONE",
            "dims": [
                1
            ],
            "is_shape_tensor": false,
            "allow_ragged_batch": false
        },
        {
            "name": "SEGMENTS_START_END",
            "data_type": "TYPE_INT32",
            "format": "FORMAT_NONE",
            "dims": [
                -1,
                2
            ],
            "is_shape_tensor": false,
            "allow_ragged_batch": false
        },
        {
            "name": "CUSTOM_CONFIGURATION",
            "data_type": "TYPE_STRING",
            "format": "FORMAT_NONE",
            "dims": [
                -1,
                2
            ],
            "is_shape_tensor": false,
            "allow_ragged_batch": false
        }
    ],
    "output": [
        {
            "name": "FINAL_TRANSCRIPTS",
            "data_type": "TYPE_STRING",
            "dims": [
                -1
            ],
            "label_filename": "",
            "is_shape_tensor": false
        },
        {
            "name": "FINAL_TRANSCRIPTS_SCORE",
            "data_type": "TYPE_FP32",
            "dims": [
                -1
            ],
            "label_filename": "",
            "is_shape_tensor": false
        },
        {
            "name": "FINAL_WORDS_START_END",
            "data_type": "TYPE_INT32",
            "dims": [
                -1,
                2
            ],
            "label_filename": "",
            "is_shape_tensor": false
        },
        {
            "name": "PARTIAL_TRANSCRIPTS",
            "data_type": "TYPE_STRING",
            "dims": [
                -1
            ],
            "label_filename": "",
            "is_shape_tensor": false
        },
        {
            "name": "PARTIAL_TRANSCRIPTS_STABILITY",
            "data_type": "TYPE_FP32",
            "dims": [
                -1
            ],
            "label_filename": "",
            "is_shape_tensor": false
        },
        {
            "name": "PARTIAL_WORDS_START_END",
            "data_type": "TYPE_INT32",
            "dims": [
                -1,
                2
            ],
            "label_filename": "",
            "is_shape_tensor": false
        }
    ],
    "batch_input": [],
    "batch_output": [],
    "optimization": {
        "priority": "PRIORITY_DEFAULT",
        "cuda": {
            "graphs": false,
            "busy_wait_events": false,
            "graph_spec": [],
            "output_copy_stream": true
        },
        "input_pinned_memory": {
            "enable": true
        },
        "output_pinned_memory": {
            "enable": true
        },
        "gather_kernel_buffer_threshold": 0,
        "eager_batching": false
    },
    "sequence_batching": {
        "oldest": {
            "max_candidate_sequences": 2048,
            "preferred_batch_size": [
                32,
                64
            ],
            "max_queue_delay_microseconds": 1000
        },
        "max_sequence_idle_microseconds": 60000000,
        "control_input": [
            {
                "name": "START",
                "control": [
                    {
                        "kind": "CONTROL_SEQUENCE_START",
                        "int32_false_true": [
                            0,
                            1
                        ],
                        "fp32_false_true": [],
                        "data_type": "TYPE_INVALID"
                    }
                ]
            },
            {
                "name": "READY",
                "control": [
                    {
                        "kind": "CONTROL_SEQUENCE_READY",
                        "int32_false_true": [
                            0,
                            1
                        ],
                        "fp32_false_true": [],
                        "data_type": "TYPE_INVALID"
                    }
                ]
            },
            {
                "name": "END",
                "control": [
                    {
                        "kind": "CONTROL_SEQUENCE_END",
                        "int32_false_true": [
                            0,
                            1
                        ],
                        "fp32_false_true": [],
                        "data_type": "TYPE_INVALID"
                    }
                ]
            },
            {
                "name": "CORRID",
                "control": [
                    {
                        "kind": "CONTROL_SEQUENCE_CORRID",
                        "int32_false_true": [],
                        "fp32_false_true": [],
                        "data_type": "TYPE_UINT64"
                    }
                ]
            }
        ]
    },
    "instance_group": [
        {
            "name": "citrinet-1024-en-US-asr-streaming-ctc-decoder-cpu-streaming_0",
            "kind": "KIND_CPU",
            "count": 1,
            "gpus": [],
            "secondary_devices": [],
            "profile": [],
            "passive": false,
            "host_policy": ""
        }
    ],
    "default_model_filename": "",
    "cc_model_filenames": {},
    "metric_tags": {},
    "parameters": {
        "use_subword": {
            "string_value": "True"
        },
        "streaming": {
            "string_value": "True"
        },
        "beam_size": {
            "string_value": "16"
        },
        "right_padding_size": {
            "string_value": "1.92"
        },
        "beam_size_token": {
            "string_value": "16"
        },
        "sil_token": {
            "string_value": "▁"
        },
        "beam_threshold": {
            "string_value": "20.0"
        },
        "language_model_file": {
            "string_value": "/data/models/citrinet-1024-en-US-asr-streaming-ctc-decoder-cpu-streaming/1/jarvis_asr_train_datasets_noSpgi_noLS_gt_3gram.binary"
        },
        "max_execution_batch_size": {
            "string_value": "1024"
        },
        "forerunner_use_lm": {
            "string_value": "true"
        },
        "forerunner_beam_size_token": {
            "string_value": "8"
        },
        "forerunner_beam_threshold": {
            "string_value": "10.0"
        },
        "asr_model_delay": {
            "string_value": "-1"
        },
        "decoder_num_worker_threads": {
            "string_value": "-1"
        },
        "word_insertion_score": {
            "string_value": "0.2"
        },
        "left_padding_size": {
            "string_value": "1.92"
        },
        "decoder_type": {
            "string_value": "flashlight"
        },
        "compute_timestamps": {
            "string_value": "True"
        },
        "forerunner_beam_size": {
            "string_value": "8"
        },
        "max_supported_transcripts": {
            "string_value": "1"
        },
        "chunk_size": {
            "string_value": "0.16"
        },
        "lexicon_file": {
            "string_value": "/data/models/citrinet-1024-en-US-asr-streaming-ctc-decoder-cpu-streaming/1/lexicon.txt"
        },
        "smearing_mode": {
            "string_value": "max"
        },
        "use_vad": {
            "string_value": "True"
        },
        "blank_token": {
            "string_value": "#"
        },
        "lm_weight": {
            "string_value": "0.2"
        },
        "vocab_file": {
            "string_value": "/data/models/citrinet-1024-en-US-asr-streaming-ctc-decoder-cpu-streaming/1/riva_decoder_vocabulary.txt"
        },
        "ms_per_timestep": {
            "string_value": "80"
        }
    },
    "model_warmup": [],
    "model_transaction_policy": {
        "decoupled": false
    }
}
I0117 17:37:08.943189 70 vad_library.cc:18] TRITONBACKEND_ModelInitialize: citrinet-1024-en-US-asr-streaming-voice-activity-detector-ctc-streaming (version 1)
W:parameter_parser.cc:118: Parameter max_execution_batch_size could not be set from parameters
W:parameter_parser.cc:119: Default value will be used
W:parameter_parser.cc:118: Parameter max_execution_batch_size could not be set from parameters
W:parameter_parser.cc:119: Default value will be used
I0117 17:37:08.945221 70 backend_model.cc:255] model configuration:
{
    "name": "citrinet-1024-en-US-asr-streaming-voice-activity-detector-ctc-streaming",
    "platform": "",
    "backend": "riva_asr_vad",
    "version_policy": {
        "latest": {
            "num_versions": 1
        }
    },
    "max_batch_size": 2048,
    "input": [
        {
            "name": "CLASS_LOGITS",
            "data_type": "TYPE_FP32",
            "format": "FORMAT_NONE",
            "dims": [
                -1,
                1025
            ],
            "is_shape_tensor": false,
            "allow_ragged_batch": false
        }
    ],
    "output": [
        {
            "name": "SEGMENTS_START_END",
            "data_type": "TYPE_INT32",
            "dims": [
                -1,
                2
            ],
            "label_filename": "",
            "is_shape_tensor": false
        }
    ],
    "batch_input": [],
    "batch_output": [],
    "optimization": {
        "priority": "PRIORITY_DEFAULT",
        "cuda": {
            "graphs": false,
            "busy_wait_events": false,
            "graph_spec": [],
            "output_copy_stream": true
        },
        "input_pinned_memory": {
            "enable": true
        },
        "output_pinned_memory": {
            "enable": true
        },
        "gather_kernel_buffer_threshold": 0,
        "eager_batching": false
    },
    "sequence_batching": {
        "max_sequence_idle_microseconds": 60000000,
        "control_input": [
            {
                "name": "START",
                "control": [
                    {
                        "kind": "CONTROL_SEQUENCE_START",
                        "int32_false_true": [
                            0,
                            1
                        ],
                        "fp32_false_true": [],
                        "data_type": "TYPE_INVALID"
                    }
                ]
            },
            {
                "name": "READY",
                "control": [
                    {
                        "kind": "CONTROL_SEQUENCE_READY",
                        "int32_false_true": [
                            0,
                            1
                        ],
                        "fp32_false_true": [],
                        "data_type": "TYPE_INVALID"
                    }
                ]
            }
        ]
    },
    "instance_group": [
        {
            "name": "citrinet-1024-en-US-asr-streaming-voice-activity-detector-ctc-streaming_0",
            "kind": "KIND_CPU",
            "count": 1,
            "gpus": [],
            "secondary_devices": [],
            "profile": [],
            "passive": false,
            "host_policy": ""
        }
    ],
    "default_model_filename": "",
    "cc_model_filenames": {},
    "metric_tags": {},
    "parameters": {
        "vad_start_history": {
            "string_value": "300"
        },
        "vad_stop_history": {
            "string_value": "800"
        },
        "chunk_size": {
            "string_value": "0.16"
        },
        "vad_start_th": {
            "string_value": "0.2"
        },
        "vad_stop_th": {
            "string_value": "0.98"
        },
        "vad_type": {
            "string_value": "ctc-vad"
        },
        "vocab_file": {
            "string_value": "/data/models/citrinet-1024-en-US-asr-streaming-voice-activity-detector-ctc-streaming/1/riva_decoder_vocabulary.txt"
        },
        "residue_blanks_at_start": {
            "string_value": "-2"
        },
        "ms_per_timestep": {
            "string_value": "80"
        },
        "streaming": {
            "string_value": "True"
        },
        "use_subword": {
            "string_value": "True"
        },
        "residue_blanks_at_end": {
            "string_value": "0"
        }
    },
    "model_warmup": [],
    "model_transaction_policy": {
        "decoupled": false
    }
}
I0117 17:37:08.945664 70 ctc-decoder-library.cc:20] TRITONBACKEND_ModelInitialize: citrinet-v3-stock-1024-english-asr-streaming-ctc-decoder-cpu-streaming (version 1)
W:parameter_parser.cc:118: Parameter forerunner_start_offset_ms could not be set from parameters
W:parameter_parser.cc:119: Default value will be used
W:parameter_parser.cc:118: Parameter forerunner_start_offset_ms could not be set from parameters
W:parameter_parser.cc:119: Default value will be used
W:parameter_parser.cc:118: Parameter max_num_slots could not be set from parameters
W:parameter_parser.cc:119: Default value will be used
I0117 17:37:08.947745 70 backend_model.cc:255] model configuration:
{
    "name": "citrinet-v3-stock-1024-english-asr-streaming-ctc-decoder-cpu-streaming",
    "platform": "",
    "backend": "riva_asr_decoder",
    "version_policy": {
        "latest": {
            "num_versions": 1
        }
    },
    "max_batch_size": 2048,
    "input": [
        {
            "name": "CLASS_LOGITS",
            "data_type": "TYPE_FP32",
            "format": "FORMAT_NONE",
            "dims": [
                -1,
                1025
            ],
            "is_shape_tensor": false,
            "allow_ragged_batch": false
        },
        {
            "name": "END_FLAG",
            "data_type": "TYPE_UINT32",
            "format": "FORMAT_NONE",
            "dims": [
                1
            ],
            "is_shape_tensor": false,
            "allow_ragged_batch": false
        },
        {
            "name": "SEGMENTS_START_END",
            "data_type": "TYPE_INT32",
            "format": "FORMAT_NONE",
            "dims": [
                -1,
                2
            ],
            "is_shape_tensor": false,
            "allow_ragged_batch": false
        },
        {
            "name": "CUSTOM_CONFIGURATION",
            "data_type": "TYPE_STRING",
            "format": "FORMAT_NONE",
            "dims": [
                -1,
                2
            ],
            "is_shape_tensor": false,
            "allow_ragged_batch": false
        }
    ],
    "output": [
        {
            "name": "FINAL_TRANSCRIPTS",
            "data_type": "TYPE_STRING",
            "dims": [
                -1
            ],
            "label_filename": "",
            "is_shape_tensor": false
        },
        {
            "name": "FINAL_TRANSCRIPTS_SCORE",
            "data_type": "TYPE_FP32",
            "dims": [
                -1
            ],
            "label_filename": "",
            "is_shape_tensor": false
        },
        {
            "name": "FINAL_WORDS_START_END",
            "data_type": "TYPE_INT32",
            "dims": [
                -1,
                2
            ],
            "label_filename": "",
            "is_shape_tensor": false
        },
        {
            "name": "PARTIAL_TRANSCRIPTS",
            "data_type": "TYPE_STRING",
            "dims": [
                -1
            ],
            "label_filename": "",
            "is_shape_tensor": false
        },
        {
            "name": "PARTIAL_TRANSCRIPTS_STABILITY",
            "data_type": "TYPE_FP32",
            "dims": [
                -1
            ],
            "label_filename": "",
            "is_shape_tensor": false
        },
        {
            "name": "PARTIAL_WORDS_START_END",
            "data_type": "TYPE_INT32",
            "dims": [
                -1,
                2
            ],
            "label_filename": "",
            "is_shape_tensor": false
        }
    ],
    "batch_input": [],
    "batch_output": [],
    "optimization": {
        "priority": "PRIORITY_DEFAULT",
        "cuda": {
            "graphs": false,
            "busy_wait_events": false,
            "graph_spec": [],
            "output_copy_stream": true
        },
        "input_pinned_memory": {
            "enable": true
        },
        "output_pinned_memory": {
            "enable": true
        },
        "gather_kernel_buffer_threshold": 0,
        "eager_batching": false
    },
    "sequence_batching": {
        "oldest": {
            "max_candidate_sequences": 2048,
            "preferred_batch_size": [
                32,
                64
            ],
            "max_queue_delay_microseconds": 1000
        },
        "max_sequence_idle_microseconds": 60000000,
        "control_input": [
            {
                "name": "START",
                "control": [
                    {
                        "kind": "CONTROL_SEQUENCE_START",
                        "int32_false_true": [
                            0,
                            1
                        ],
                        "fp32_false_true": [],
                        "data_type": "TYPE_INVALID"
                    }
                ]
            },
            {
                "name": "READY",
                "control": [
                    {
                        "kind": "CONTROL_SEQUENCE_READY",
                        "int32_false_true": [
                            0,
                            1
                        ],
                        "fp32_false_true": [],
                        "data_type": "TYPE_INVALID"
                    }
                ]
            },
            {
                "name": "END",
                "control": [
                    {
                        "kind": "CONTROL_SEQUENCE_END",
                        "int32_false_true": [
                            0,
                            1
                        ],
                        "fp32_false_true": [],
                        "data_type": "TYPE_INVALID"
                    }
                ]
            },
            {
                "name": "CORRID",
                "control": [
                    {
                        "kind": "CONTROL_SEQUENCE_CORRID",
                        "int32_false_true": [],
                        "fp32_false_true": [],
                        "data_type": "TYPE_UINT64"
                    }
                ]
            }
        ]
    },
    "instance_group": [
        {
            "name": "citrinet-v3-stock-1024-english-asr-streaming-ctc-decoder-cpu-streaming_0",
            "kind": "KIND_CPU",
            "count": 1,
            "gpus": [],
            "secondary_devices": [],
            "profile": [],
            "passive": false,
            "host_policy": ""
        }
    ],
    "default_model_filename": "",
    "cc_model_filenames": {},
    "metric_tags": {},
    "parameters": {
        "decoder_type": {
            "string_value": "flashlight"
        },
        "compute_timestamps": {
            "string_value": "True"
        },
        "forerunner_beam_size": {
            "string_value": "8"
        },
        "chunk_size": {
            "string_value": "0.8"
        },
        "max_supported_transcripts": {
            "string_value": "1"
        },
        "lexicon_file": {
            "string_value": "/data/models/citrinet-v3-stock-1024-english-asr-streaming-ctc-decoder-cpu-streaming/1/lexicon.txt"
        },
        "smearing_mode": {
            "string_value": "max"
        },
        "use_vad": {
            "string_value": "True"
        },
        "blank_token": {
            "string_value": "#"
        },
        "lm_weight": {
            "string_value": "0.2"
        },
        "vocab_file": {
            "string_value": "/data/models/citrinet-v3-stock-1024-english-asr-streaming-ctc-decoder-cpu-streaming/1/riva_decoder_vocabulary.txt"
        },
        "ms_per_timestep": {
            "string_value": "80"
        },
        "streaming": {
            "string_value": "True"
        },
        "use_subword": {
            "string_value": "True"
        },
        "beam_size": {
            "string_value": "16"
        },
        "right_padding_size": {
            "string_value": "1.6"
        },
        "beam_size_token": {
            "string_value": "16"
        },
        "sil_token": {
            "string_value": "▁"
        },
        "beam_threshold": {
            "string_value": "20.0"
        },
        "language_model_file": {
            "string_value": "/data/models/citrinet-v3-stock-1024-english-asr-streaming-ctc-decoder-cpu-streaming/1/mixed-lower.binary"
        },
        "max_execution_batch_size": {
            "string_value": "1024"
        },
        "forerunner_use_lm": {
            "string_value": "true"
        },
        "forerunner_beam_size_token": {
            "string_value": "8"
        },
        "forerunner_beam_threshold": {
            "string_value": "10.0"
        },
        "decoder_num_worker_threads": {
            "string_value": "-1"
        },
        "asr_model_delay": {
            "string_value": "-1"
        },
        "word_insertion_score": {
            "string_value": "0.2"
        },
        "left_padding_size": {
            "string_value": "1.6"
        }
    },
    "model_warmup": [],
    "model_transaction_policy": {
        "decoupled": false
    }
}
I0117 17:37:08.947884 70 ctc-decoder-library.cc:23] TRITONBACKEND_ModelInstanceInitialize: citrinet-v3-stock-1024-english-asr-streaming-ctc-decoder-cpu-streaming_0 (device 0)
  > Riva waiting for Triton server to load all models...retrying in 1 second
terminate called after throwing an instance of 'std::invalid_argument'
  what():  Unknown entry in dictionary: '221540'
  > Riva waiting for Triton server to load all models...retrying in 1 second
  > Riva waiting for Triton server to load all models...retrying in 1 second
  > Riva waiting for Triton server to load all models...retrying in 1 second
  > Riva waiting for Triton server to load all models...retrying in 1 second
  > Riva waiting for Triton server to load all models...retrying in 1 second
  > Riva waiting for Triton server to load all models...retrying in 1 second
  > Riva waiting for Triton server to load all models...retrying in 1 second
  > Riva waiting for Triton server to load all models...retrying in 1 second
  > Riva waiting for Triton server to load all models...retrying in 1 second
  > Riva waiting for Triton server to load all models...retrying in 1 second
  > Riva waiting for Triton server to load all models...retrying in 1 second
  > Riva waiting for Triton server to load all models...retrying in 1 second
  > Riva waiting for Triton server to load all models...retrying in 1 second
  > Riva waiting for Triton server to load all models...retrying in 1 second
  > Riva waiting for Triton server to load all models...retrying in 1 second
  > Riva waiting for Triton server to load all models...retrying in 1 second
  > Riva waiting for Triton server to load all models...retrying in 1 second
/opt/riva/bin/start-riva: line 4:    70 Aborted                 (core dumped) tritonserver --log-verbose=0 --strict-model-config=true $model_repos --cuda-memory-pool-byte-size=0:1000000000
  > Triton server died before reaching ready state. Terminating Riva startup.
Check Triton logs with: docker logs
kill: usage: kill [-s sigspec | -n signum | -sigspec] pid | jobspec ... or kill -l [sigspec]

I believe the relevant portion of the error is

I0117 17:37:08.947884 70 ctc-decoder-library.cc:23] TRITONBACKEND_ModelInstanceInitialize: citrinet-v3-stock-1024-english-asr-streaming-ctc-decoder-cpu-streaming_0 (device 0)
  > Riva waiting for Triton server to load all models...retrying in 1 second
terminate called after throwing an instance of 'std::invalid_argument'
  what():  Unknown entry in dictionary: '221540'

I’d like to know :
1. If I’m doing anything wrong or if there’s an error in the files available in NGC.
2. If this is all that’s required to create the stock models. Do I need to use the rescorer?

Thanks in advance!

1 Like

Just wanted to give an update: I was able to make it work using a greedy decoder:

riva-build speech_recognition \
   /data/rmir/citrinet_v3_stock.rmir:tlt_encode speechtotext_en_us_citrinet_vdeployable_v3.0/citrinet-1024-Jarvis-asrset-3_0-encrypted.riva:tlt_encode \
   --name=citrinet-v3-stock-1024-english-asr-streaming \
   --ms_per_timestep=80 \
   --featurizer.use_utterance_norm_params=False \
   --featurizer.precalc_norm_time_steps=0 \
   --featurizer.precalc_norm_params=False \
   --vad.residue_blanks_at_start=-2 \
   --chunk_size=0.8 \
   --left_padding_size=1.6 \
   --right_padding_size=1.6 \
   --decoder_type=greedy \
   --language_code=en-US -f

So I suspect this is a problem with my choice of decoder.

Also, looking further into the error, I think I found the code that generates it in flashlight.
(flashlight/Dictionary.cpp at 9193b92e86538128664c1cf22785f4f5778c8312 · flashlight/flashlight · GitHub)

int Dictionary::getIndex(const std::string& entry) const {
  auto iter = entry2idx_.find(entry);
  if (iter == entry2idx_.end()) {
    if (defaultIndex_ < 0) {
      throw std::invalid_argument(
          "Unknown entry in dictionary: '" + entry + "'");
    } else {
      return defaultIndex_;
    }
  }
  return iter->second;
}

One confusing thing is that it seems entry is a string and index is an integer, but the error is complaining about an unknown entry (i.e. ‘221540’) that’s an integer. That seems backwards to me.

I0117 17:37:08.947884 70 ctc-decoder-library.cc:23] TRITONBACKEND_ModelInstanceInitialize: citrinet-v3-stock-1024-english-asr-streaming-ctc-decoder-cpu-streaming_0 (device 0)
  > Riva waiting for Triton server to load all models...retrying in 1 second
terminate called after throwing an instance of 'std::invalid_argument'
  what():  Unknown entry in dictionary: '221540'

Hope this extra info helps debug the issue!

Is there any update on my question? Perhaps a suggestion on a good dictionary I could be using for --decoding_vocab. I still haven’t found the problem in the riva-build command or in my deployment of the model.

Thanks.

2 Likes

I0401 03:34:12.062220 70 ctc-decoder-library.cc:23] TRITONBACKEND_ModelInstanceInitialize: riva-asr-ctc-decoder-cpu-streaming_0 (device 0)
I0401 03:34:12.139479 70 model_repository_manager.cc:1045] loading: riva-asr-voice-activity-detector-ctc-streaming:1
terminate called after throwing an instance of ‘std::invalid_argument’
what(): Unknown entry in dictionary: ‘▁’
/opt/riva/bin/start-riva: line 4: 70 Aborted (core dumped) tritonserver --log-verbose=0 --strict-model-config=true $model_repos --cuda-memory-pool-byte-size=0:1000000000

Triton server died before reaching ready state. Terminating Riva startup.

Same problem, any update?

I got this working, will update here later with the exact build command and setup I used.

Yep this problem is also occurring for me.
Did anybody get it working with the flashlight decoder?

I got english LM working but mandarin doesn’t!
English LM:
Just download en LM from nvidia NGC and run the riva deploy/build command.

do you have the exact riva-build command? the pre-trained model works for me too

Model from v1.0.0:

LM/vocab file from v1.1:

riva-build speech_recognition
stt_en_citrinet_1024_gamma_0_25_with_flashlight.rmir
stt_en_citrinet_1024_gamma_0_25.riva
–offline
–name=stt_en_citrinet_1024_gamma_0_25_offline
–ms_per_timestep=80
–featurizer.use_utterance_norm_params=False
–featurizer.precalc_norm_time_steps=0
–featurizer.precalc_norm_params=False
–chunk_size=61
–left_padding_size=0.
–right_padding_size=0.
–decoder_type=flashlight
–flashlight_decoder.asr_model_delay=-1
–decoding_language_model_binary=mixed-lower.binary
–decoding_vocab=flashlight_decoder_vocab.txt
–flashlight_decoder.lm_weight=0.2
–flashlight_decoder.word_insertion_score=0.2
–flashlight_decoder.beam_threshold=20.
–language_code=en-US

still does not work for me, maybe its because i am using streaming citrinet

I trained a citrinet 512 nemo model on a filipino language with tokenizer vocab size- 1024 , tokenizer type - bpe (Google Sentence Piece tokenizer). After Successful training, I tried for deployment in riva using 3-gram kenlm binary and decoder type - flashlight. During model load in Triton Server i got a error, which is -

Blockquote I0712 13:32:15.565105 94 ctc-decoder-library.cc:23] TRITONBACKEND_ModelInstanceInitialize: nemo_asr_riva_pipeline_filipino-ctc-decoder-cpu-streaming_0 (device 0)
terminate called after throwing an instance of ‘std::runtime_error’
what(): [LoadWords] Invalid line: ▁
Riva waiting for Triton server to load all models…retrying in 1 second
/opt/riva/bin/start-riva: line 4: 94 Aborted (core dumped) tritonserver --log-verbose=0 --strict-model-config=true $model_repos --cuda-memory-pool-byte-size=0:1000000000

Riva Build Command -:

Blockquote
riva-build speech_recognition -f
/servicemaker-dev/citrinet_512_exp.rmir
/servicemaker-dev/citrinet_512_exp.riva
–streaming True
–name=complete_speech_service
–decoder_type=flashlight
–flashlight_decoder.language_model_file=/servicemaker-dev/lm_3.binary
–flashlight_decoder.vocab_file=/servicemaker-dev/vocab.txt
–flashlight_decoder.beam_size=16
–flashlight_decoder.lm_weight=1.0
–decoding_vocab=/servicemaker-dev/vocab.txt

Any sort of suggestion would be helpful .

Instead of supplying --decoding_vocab and --flashlight_decoder.vocab_file, provide a pre-prepared lexicon --decoding_lexicon. Detailed instructions for its preparation can be found in Riva documentation How to Customize Riva ASR Vocabulary and Pronunciation with Lexicon Mapping — NVIDIA Riva.

lexicon (2).txt (14.8 KB)

This is my lexicon file created during riva build. So there are two questions :

  1. does this lexicon file look ok?
  2. when i try to load models in triton server this result in to error :

what(): Unknown entry in dictionary: ‘▁##he
Riva waiting for Triton server to load all models…retrying in 1 second
/opt/riva/bin/start-riva: line 4: 94 Aborted (core dumped) tritonserver --log-verbose=0 --strict-model-config=true $model_repos --cuda-memory-pool-byte-size=0:1000000000
Triton server died before reaching ready state. Terminating Riva startup.
Blockquote

@sachin.sachan I suggest you read the Riva documentation carefully (see link in previous reply); based on the contents of the txt file it seems that you are supplying the tokenizer vocabulary file and not your own.

The SPE tokenizer model internally has its own tokenizer vocabulary file, i.e. the actual tokens (sub-word units) the ASR recognises. The vocabulary you need to supply when running the build command is, on the other hand, the list of words that you wish your ASR recognises. The words from this vocabulary must contain ‘valid’ character sequences so that they can be constructed by joining individual tokens from the tokenizer vocabulary (sub-word units); e.g. today as ▁to da y.

Thanks i used a custom vocabulary and it has been deployed perfectly. But there is another issue when i try to check the transcripts it gives me empty transcripts.

riva-build speech_recognition -f
/servicemaker-dev/citrinet_model_3.rmir
/servicemaker-dev/citrinet_model_3.riva
–name=online_speech_service
–decoder_type=flashlight
–decoding_language_model_binary=/servicemaker-dev/lm.binary
–flashlight_decoder.beam_size=16
–flashlight_decoder.lm_weight=1.0
–flashlight_decoder.word_insertion_score=0.5
–decoding_vocab=/servicemaker-dev/vocab.txt

riva-deploy -f /servicemaker-dev/citrinet_model_3.rmir /data/models

Docker logs

I0804 10:53:20.866004 94 server.cc:549]
±------------------±----------------------------------------------------------------------------±-------+
| Backend | Path | Config |
±------------------±----------------------------------------------------------------------------±-------+
| onnxruntime | /opt/tritonserver/backends/onnxruntime/libtriton_onnxruntime.so | {} |
| riva_asr_features | /opt/tritonserver/backends/riva_asr_features/libtriton_riva_asr_features.so | {} |
| riva_asr_decoder | /opt/tritonserver/backends/riva_asr_decoder/libtriton_riva_asr_decoder.so | {} |
| tensorrt | /opt/tritonserver/backends/tensorrt/libtriton_tensorrt.so | {} |
| riva_asr_vad | /opt/tritonserver/backends/riva_asr_vad/libtriton_riva_asr_vad.so | {} |
±------------------±----------------------------------------------------------------------------±-------+

I0804 10:53:20.866104 94 server.cc:592]
±------------------------------------------------------------±--------±-------+
| Model | Version | Status |
±------------------------------------------------------------±--------±-------+
| online_speech_service | 1 | READY |
| online_speech_service-ctc-decoder-cpu-streaming | 1 | READY |
| online_speech_service-feature-extractor-streaming | 1 | READY |
| online_speech_service-voice-activity-detector-ctc-streaming | 1 | READY |
| riva-trt-online_speech_service-am-streaming | 1 | READY |
±------------------------------------------------------------±--------±-------+

I0804 10:53:20.925554 94 metrics.cc:623] Collecting metrics for GPU 0: Tesla T4
I0804 10:53:20.926014 94 tritonserver.cc:1932]
±---------------------------------±---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Option | Value |
±---------------------------------±---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| server_id | triton |
| server_version | 2.19.0 |
| server_extensions | classification sequence model_repository model_repository(unload_dependents) schedule_policy model_configuration system_shared_memory cuda_shared_memory binary_tensor_data statistics trace |
| model_repository_path[0] | /data/models |
| model_control_mode | MODE_NONE |
| strict_model_config | 1 |
| rate_limit | OFF |
| pinned_memory_pool_byte_size | 268435456 |
| cuda_memory_pool_byte_size{0} | 1000000000 |
| response_cache_byte_size | 0 |
| min_supported_compute_capability | 6.0 |
| strict_readiness | 1 |
| exit_timeout | 30 |
±---------------------------------±---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

I0804 10:53:20.927486 94 grpc_server.cc:4375] Started GRPCInferenceService at 0.0.0.0:8001
I0804 10:53:20.927828 94 http_server.cc:3075] Started HTTPService at 0.0.0.0:8000
I0804 10:53:20.969651 94 http_server.cc:178] Started Metrics Service at 0.0.0.0:8002

Triton server is ready…
I0804 10:53:21.876487 181 riva_server.cc:118] Using Insecure Server Credentials
I0804 10:53:21.880679 181 model_registry.cc:112] Successfully registered: online_speech_service for ASR
W0804 10:53:21.897004 181 grpc_riva_asr.cc:157] online_speech_service has no configured wfst normalizer model
I0804 10:53:21.897382 181 riva_server.cc:158] Riva Conversational AI Server listening on 0.0.0.0:50051
W0804 10:53:21.897398 181 stats_reporter.cc:41] No API key provided. Stats reporting disabled.

Inference Code :

channel = grpc.insecure_channel(server)
riva_client = rasr_srv.RivaSpeechRecognitionStub(channel)
config = rasr.RecognitionConfig(
encoding=ra.AudioEncoding.LINEAR_PCM,
sample_rate_hertz=16000,
language_code="en-US",
max_alternatives=1,
enable_automatic_punctuation=True,
enable_word_time_offsets=True,)
streaming_config = rasr.StreamingRecognitionConfig(
    config=config)

actual_transcript = []
stream = [] 
import jsonlines
n_samples = 5
test_path = "/opt/nemo-asr/Filipino/train_manifest.json"

with jsonlines.open(test_path) as f:
    for k,line in enumerate(f.iter()):
                
        if(k < n_samples):
            with open(line['audio_filepath'], 'rb') as fh:
                data = fh.read()
  
            stream.append(data)
            actual_transcript.append(line['text'])
  
        else:
            break

def build_generator(cfg, gen):
        yield rasr.StreamingRecognizeRequest(streaming_config=cfg)
        for x in gen:
            yield x
        yield cfg

def inference(stream):
    request = (rasr.StreamingRecognizeRequest(audio_content=content) for content in stream)
    responses = riva_client.StreamingRecognize(build_generator( 
                                streaming_config, request))

    return responses

predicted_transcript = []
for k,i in enumerate(actual_transcript):
    responses = inference(stream[k:k+1])
    all_alt = []
    for response in responses:
        for result in response.results:
            if result.is_final == True:
                alternatives = result.alternatives
                
                for alternative in alternatives:
                    all_alt.append(alternative.transcript)
                   
    predicted_transcript.append(all_alt)

Output :

predicted_transcript : [, , , , ]

why transcripts are empty?