Triton server died before reaching ready state. Terminating Riva startup

Please provide the following information when requesting support.

Hardware - GPU (A100)
Hardware - CPU
Operating System: Ubuntu 20.04
Riva Version: 1.6.0-beta
TLT Version

Hi

I would like to test Riva on the workstation, since there is no internet access on this workstation, here is what I’ve done:

  1. Download the docker images to my Linux device via running bash riva_init.sh
  2. Save these 3 docker images into tar.gz file
  3. Transfer those 3 tar.gz files to the workstation.
  4. Load docker images with these 3 tar.gz files on the workstation
  5. Run bash riva_start.sh on the workstation

then I got an error

Triton server died before reaching ready state. Terminating Riva startup.

here is the complete log message:

==========================
=== Riva Speech Skills ===

NVIDIA Release 21.09 (build 27567456)

Copyright (c) 2018-2021, NVIDIA CORPORATION. All rights reserved.

Various files include modifications (c) NVIDIA CORPORATION. All rights reserved.
NVIDIA modifications are covered by the license terms that apply to the underlying
project or file.

Riva waiting for Triton server to load all models…retrying in 1 second
I1007 07:32:54.182862 70 metrics.cc:228] Collecting metrics for GPU 0: NVIDIA A100-SXM-80GB
I1007 07:32:54.183331 70 tritonserver.cc:1658]
±---------------------------------±---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Option | Value |
±---------------------------------±---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| server_id | triton |
| server_version | 2.9.0 |
| server_extensions | classification sequence model_repository model_repository(unload_dependents) schedule_policy model_configuration system_shared_memory cuda_shared_memory binary_tensor_data statistics |
| model_control_mode | MODE_NONE |
| strict_model_config | 1 |
| pinned_memory_pool_byte_size | 268435456 |
| cuda_memory_pool_byte_size{0} | 1000000000 |
| min_supported_compute_capability | 6.0 |
| strict_readiness | 1 |
| exit_timeout | 30 |
±---------------------------------±---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

I1007 07:32:54.183346 70 server.cc:231] No server context available. Exiting immediately.
error: creating server: Invalid argument - --model-repository must be specified

Riva waiting for Triton server to load all models…retrying in 1 second
Riva waiting for Triton server to load all models…retrying in 1 second
Triton server died before reaching ready state. Terminating Riva startup.
Check Triton logs with: docker logs
/opt/riva/bin/start-riva: line 1: kill: (70) - No such process

1 Like

Hi,

Looks like your Triton server is down, due to models are not loaded using --model-repository option. Could you please share us docker log and config file/steps to understand the deployment here for better help.

Thank you.

Hi spolisetty
The log message that I provided earlier is the one that I get via running docker logs riva-speech

Here is the content of config.sh that is under folder riva_quickstart_v1.6.0-beta

# Copyright (c) 2021, NVIDIA CORPORATION. All rights reserved.

#

# NVIDIA CORPORATION and its licensors retain all intellectual property

# and proprietary rights in and to this software, related documentation

# and any modifications thereto. Any use, reproduction, disclosure or

# distribution of this software and related documentation without an express

# license agreement from NVIDIA CORPORATION is strictly prohibited.

# Enable or Disable Riva Services

service_enabled_asr=true

service_enabled_nlp=true

service_enabled_tts=true

# Specify one or more GPUs to use

# specifying more than one GPU is currently an experimental feature, and may result in undefined behaviours.

gpus_to_use="device=0"

# Specify the encryption key to use to deploy models

MODEL_DEPLOY_KEY="tlt_encode"

# Locations to use for storing models artifacts

#

# If an absolute path is specified, the data will be written to that location

# Otherwise, a docker volume will be used (default).

#

# riva_init.sh will create a `rmir` and `models` directory in the volume or

# path specified.

#

# RMIR ($riva_model_loc/rmir)

# Riva uses an intermediate representation (RMIR) for models

# that are ready to deploy but not yet fully optimized for deployment. Pretrained

# versions can be obtained from NGC (by specifying NGC models below) and will be

# downloaded to $riva_model_loc/rmir by `riva_init.sh`

#

# Custom models produced by NeMo or TLT and prepared using riva-build

# may also be copied manually to this location $(riva_model_loc/rmir).

#

# Models ($riva_model_loc/models)

# During the riva_init process, the RMIR files in $riva_model_loc/rmir

# are inspected and optimized for deployment. The optimized versions are

# stored in $riva_model_loc/models. The riva server exclusively uses these

# optimized versions.

riva_model_loc="riva-model-repo"

# The default RMIRs are downloaded from NGC by default in the above $riva_rmir_loc directory

# If you'd like to skip the download from NGC and use the existing RMIRs in the $riva_rmir_loc

# then set the below $use_existing_rmirs flag to true. You can also deploy your set of custom

# RMIRs by keeping them in the riva_rmir_loc dir and use this quickstart script with the

# below flag to deploy them all together.

use_existing_rmirs=false

# Ports to expose for Riva services

riva_speech_api_port="50051"

riva_vision_api_port="60051"

# NGC orgs

riva_ngc_org="nvidia"

riva_ngc_team="riva"

riva_ngc_image_version="1.6.0-beta"

riva_ngc_model_version="1.6.0-beta"

# Pre-built models listed below will be downloaded from NGC. If models already exist in $riva-rmir

# then models can be commented out to skip download from NGC

########## ASR MODELS ##########

models_asr=(

### Punctuation model

"${riva_ngc_org}/${riva_ngc_team}/rmir_nlp_punctuation_bert_base:${riva_ngc_model_version}"

### Citrinet-1024 Streaming w/ CPU decoder, best latency configuration

"${riva_ngc_org}/${riva_ngc_team}/rmir_asr_citrinet_1024_asrset3p0_streaming:${riva_ngc_model_version}"

### Citrinet-1024 Streaming w/ CPU decoder, best throughput configuration

# "${riva_ngc_org}/${riva_ngc_team}/rmir_asr_citrinet_1024_asrset3p0_streaming_throughput:${riva_ngc_model_version}"

### Citrinet-1024 Offline w/ CPU decoder,

"${riva_ngc_org}/${riva_ngc_team}/rmir_asr_citrinet_1024_asrset3p0_offline:${riva_ngc_model_version}"

### Jasper Streaming w/ CPU decoder, best latency configuration

# "${riva_ngc_org}/${riva_ngc_team}/rmir_asr_jasper_english_streaming:${riva_ngc_model_version}"

### Jasper Streaming w/ CPU decoder, best throughput configuration

# "${riva_ngc_org}/${riva_ngc_team}/rmir_asr_jasper_english_streaming_throughput:${riva_ngc_model_version}"

### Jasper Offline w/ CPU decoder

# "${riva_ngc_org}/${riva_ngc_team}/rmir_asr_jasper_english_offline:${riva_ngc_model_version}"

### Quarztnet Streaming w/ CPU decoder, best latency configuration

# "${riva_ngc_org}/${riva_ngc_team}/rmir_asr_quartznet_english_streaming:${riva_ngc_model_version}"

### Quarztnet Streaming w/ CPU decoder, best throughput configuration

# "${riva_ngc_org}/${riva_ngc_team}/rmir_asr_quartznet_english_streaming_throughput:${riva_ngc_model_version}"

### Quarztnet Offline w/ CPU decoder

# "${riva_ngc_org}/${riva_ngc_team}/rmir_asr_quartznet_english_offline:${riva_ngc_model_version}"

### Jasper Streaming w/ GPU decoder, best latency configuration

# "${riva_ngc_org}/${riva_ngc_team}/rmir_asr_jasper_english_streaming_gpu_decoder:${riva_ngc_model_version}"

### Jasper Streaming w/ GPU decoder, best throughput configuration

# "${riva_ngc_org}/${riva_ngc_team}/rmir_asr_jasper_english_streaming_throughput_gpu_decoder:${riva_ngc_model_version}"

### Jasper Offline w/ GPU decoder

# "${riva_ngc_org}/${riva_ngc_team}/rmir_asr_jasper_english_offline_gpu_decoder:${riva_ngc_model_version}"

)

########## NLP MODELS ##########

models_nlp=(

### Bert base Punctuation model

"${riva_ngc_org}/${riva_ngc_team}/rmir_nlp_punctuation_bert_base:${riva_ngc_model_version}"

### BERT base Named Entity Recognition model fine-tuned on GMB dataset with class labels LOC, PER, ORG etc.

"${riva_ngc_org}/${riva_ngc_team}/rmir_nlp_named_entity_recognition_bert_base:${riva_ngc_model_version}"

### BERT Base Intent Slot model fine-tuned on weather dataset.

"${riva_ngc_org}/${riva_ngc_team}/rmir_nlp_intent_slot_bert_base:${riva_ngc_model_version}"

### BERT Base Question Answering model fine-tuned on Squad v2.

"${riva_ngc_org}/${riva_ngc_team}/rmir_nlp_question_answering_bert_base:${riva_ngc_model_version}"

### Megatron345M Question Answering model fine-tuned on Squad v2.

# "${riva_ngc_org}/${riva_ngc_team}/rmir_nlp_question_answering_megatron:${riva_ngc_model_version}"

### Bert base Text Classification model fine-tuned on 4class (weather, meteorology, personality, nomatch) domain model.

"${riva_ngc_org}/${riva_ngc_team}/rmir_nlp_text_classification_bert_base:${riva_ngc_model_version}"

)

########## TTS MODELS ##########

models_tts=(

"${riva_ngc_org}/${riva_ngc_team}/rmir_tts_fastpitch_hifigan_ljspeech:${riva_ngc_model_version}"

# "${riva_ngc_org}/${riva_ngc_team}/rmir_tts_tacotron_waveglow_ljspeech:${riva_ngc_model_version}"

)

NGC_TARGET=${riva_ngc_org}

if [[ ! -z ${riva_ngc_team} ]]; then

NGC_TARGET="${NGC_TARGET}/${riva_ngc_team}"

else

team="\"\""

fi

# define docker images required to run Riva

image_client="nvcr.io/${NGC_TARGET}/riva-speech-client:${riva_ngc_image_version}"

image_speech_api="nvcr.io/${NGC_TARGET}/riva-speech:${riva_ngc_image_version}-server"

# define docker images required to setup Riva

image_init_speech="nvcr.io/${NGC_TARGET}/riva-speech:${riva_ngc_image_version}-servicemaker"

# daemon names

riva_daemon_speech="riva-speech"

riva_daemon_client="riva-client"

BTW, here is the log while running bash riva_start.sh

$ bash riva_start.sh
Starting Riva Speech Services. This may take several minutes depending on the number of models deployed.
Waiting for Riva server to load all models...retrying in 10 seconds
Waiting for Riva server to load all models...retrying in 10 seconds
Waiting for Riva server to load all models...retrying in 10 seconds
Waiting for Riva server to load all models...retrying in 10 seconds
Waiting for Riva server to load all models...retrying in 10 seconds
Waiting for Riva server to load all models...retrying in 10 seconds
Waiting for Riva server to load all models...retrying in 10 seconds
Waiting for Riva server to load all models...retrying in 10 seconds
Waiting for Riva server to load all models...retrying in 10 seconds
Waiting for Riva server to load all models...retrying in 10 seconds
Waiting for Riva server to load all models...retrying in 10 seconds
Waiting for Riva server to load all models...retrying in 10 seconds
Waiting for Riva server to load all models...retrying in 10 seconds
Waiting for Riva server to load all models...retrying in 10 seconds
Waiting for Riva server to load all models...retrying in 10 seconds
Waiting for Riva server to load all models...retrying in 10 seconds
Waiting for Riva server to load all models...retrying in 10 seconds
Waiting for Riva server to load all models...retrying in 10 seconds
Waiting for Riva server to load all models...retrying in 10 seconds
Waiting for Riva server to load all models...retrying in 10 seconds
Waiting for Riva server to load all models...retrying in 10 seconds
Waiting for Riva server to load all models...retrying in 10 seconds
Waiting for Riva server to load all models...retrying in 10 seconds
Waiting for Riva server to load all models...retrying in 10 seconds
Waiting for Riva server to load all models...retrying in 10 seconds
Waiting for Riva server to load all models...retrying in 10 seconds
Waiting for Riva server to load all models...retrying in 10 seconds
Waiting for Riva server to load all models...retrying in 10 seconds
Waiting for Riva server to load all models...retrying in 10 seconds
Waiting for Riva server to load all models...retrying in 10 seconds
Health ready check failed.
Check Riva logs with: docker logs riva-speech
$ docker logs riva-speech

and…here is another log while running bash riva_init.sh on my edge device (not workstation)

Downloading models (RMIRs) from NGC...
Note: this may take some time, depending on the speed of your Internet connection.
To skip this process and use existing RMIRs set the location and corresponding flag in config.sh.
WARNING: The requested image's platform (linux/amd64) does not match the detected host platform (linux/arm64/v8) and no specific platform was requested
docker: Error response from daemon: OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: Running hook #0:: error running hook: exit status 1, stdout: , stderr: exec command: [/usr/bin/nvidia-container-cli --load-kmods configure --ldconfig=@/sbin/ldconfig.real --device=0 --compute --utility --video --require=cuda>=9.0 --pid=17492 /var/lib/docker/overlay2/7c69fdb7e53829db6a69d720d32ea7b28ab6a0f74b9174bcd1e8e49639f90b85/merged]
nvidia-container-cli: mount error: mount operation failed: /usr/src/tensorrt: no such file or directory: unknown.
Error in downloading models.

Hi,

Could you please let us know edge device are you using here. Just for your info please make sure, you’re using supported hardware. Please refer support matrix here Support Matrix — NVIDIA Riva Speech Skills v1.6.0-beta documentation
For your reference similar error,
Failed to run tensorrt docker image on Jetson Nano - #3 by AastaLLL

Thank you.

Hi
the edge device that I used is Jetson Nano.
I know that Jetson is not a supported hardware for Riva, so I just use it to download docker images from NGC via running riva_init.sh.
Once the images are downloaded, I transfer them to workstation, where I plan to run Riva. (p.s. There is no internet access on Workstation).

It seems that the models (RMIRs) are planned to be downloaded from NGC when running riva_init.sh, is there any way that I could skip the model downloading process?

And how could I download this models not via running riva_init.sh?

Hi,

Sorry for the delayed response. We can either comment out all the models or set all the service enabled flags to false. All of the models enumerated in config.sh correspond to paths in NGC. We can use the NGC CLI to download them manually.

Thank you.

Hi
Thanks for your reply, I have set all the service enable flags to false and use_existing_rmirs to true. I also comment out the portion of checking NGC API key in riva_init.sh, here I attached my new config.sh and riva_init.sh

However, I still cannot initialize Riva properly due to the following error after running riva_init.sh

find: '/data/rmir': No sush file or directly

It came from the last docker command in riva_init.sh

docker run --init -it --rm --gpus '"'$gpus_to_use'"' \
  -v $riva_model_loc:/data \
  -e "MODEL_DEPLOY_KEY=${MODEL_DEPLOY_KEY}" \
          --name riva-service-maker \
  $image_init_speech deploy_all_models /data/rmir /data/models

Here are my questions:

  1. Where should I create the folder $riva_model_loc? Is it under riva_quickstart_v1.6.0-beta?
  2. Should I download the models first before running riva_init.sh?
  3. Where could I find theses models on NGC? (i.e. I see "${riva_ngc_org}/${riva_ngc_team}/rmir_nlp_punctuation_bert_base:${riva_ngc_model_version}") in the config.sh as one of the ASR models, how could I find it on NGC? Querying “Punctuation model” gives me too many results and I have no way to filter further

  1. and where should I store these models locally? Should I keep them in the folder $riva_model_loc/rmir ?

Config.sh

# Copyright (c) 2021, NVIDIA CORPORATION.  All rights reserved.
#
# NVIDIA CORPORATION and its licensors retain all intellectual property
# and proprietary rights in and to this software, related documentation
# and any modifications thereto.  Any use, reproduction, disclosure or
# distribution of this software and related documentation without an express
# license agreement from NVIDIA CORPORATION is strictly prohibited.

# Enable or Disable Riva Services
service_enabled_asr=false
service_enabled_nlp=false
service_enabled_tts=false

# Specify one or more GPUs to use
# specifying more than one GPU is currently an experimental feature, and may result in undefined behaviours.
gpus_to_use="device=0"

# Specify the encryption key to use to deploy models
MODEL_DEPLOY_KEY="tlt_encode"

# Locations to use for storing models artifacts
#
# If an absolute path is specified, the data will be written to that location
# Otherwise, a docker volume will be used (default).
#
# riva_init.sh will create a `rmir` and `models` directory in the volume or
# path specified. 
#
# RMIR ($riva_model_loc/rmir)
# Riva uses an intermediate representation (RMIR) for models
# that are ready to deploy but not yet fully optimized for deployment. Pretrained
# versions can be obtained from NGC (by specifying NGC models below) and will be
# downloaded to $riva_model_loc/rmir by `riva_init.sh`
# 
# Custom models produced by NeMo or TLT and prepared using riva-build
# may also be copied manually to this location $(riva_model_loc/rmir).
#
# Models ($riva_model_loc/models)
# During the riva_init process, the RMIR files in $riva_model_loc/rmir
# are inspected and optimized for deployment. The optimized versions are
# stored in $riva_model_loc/models. The riva server exclusively uses these
# optimized versions.
riva_model_loc="riva-model-repo"

# The default RMIRs are downloaded from NGC by default in the above $riva_rmir_loc directory
# If you'd like to skip the download from NGC and use the existing RMIRs in the $riva_rmir_loc
# then set the below $use_existing_rmirs flag to true. You can also deploy your set of custom
# RMIRs by keeping them in the riva_rmir_loc dir and use this quickstart script with the
# below flag to deploy them all together.
use_existing_rmirs=true

# Ports to expose for Riva services
riva_speech_api_port="50051"
riva_vision_api_port="60051"

# NGC orgs
riva_ngc_org="nvidia"
riva_ngc_team="riva"
riva_ngc_image_version="1.6.0-beta"
riva_ngc_model_version="1.6.0-beta"

# Pre-built models listed below will be downloaded from NGC. If models already exist in $riva-rmir
# then models can be commented out to skip download from NGC

########## ASR MODELS ##########

models_asr=(
### Punctuation model
    "${riva_ngc_org}/${riva_ngc_team}/rmir_nlp_punctuation_bert_base:${riva_ngc_model_version}"

### Citrinet-1024 Streaming w/ CPU decoder, best latency configuration
    "${riva_ngc_org}/${riva_ngc_team}/rmir_asr_citrinet_1024_asrset3p0_streaming:${riva_ngc_model_version}"

### Citrinet-1024 Streaming w/ CPU decoder, best throughput configuration
#    "${riva_ngc_org}/${riva_ngc_team}/rmir_asr_citrinet_1024_asrset3p0_streaming_throughput:${riva_ngc_model_version}"

### Citrinet-1024 Offline w/ CPU decoder, 
    "${riva_ngc_org}/${riva_ngc_team}/rmir_asr_citrinet_1024_asrset3p0_offline:${riva_ngc_model_version}"

### Jasper Streaming w/ CPU decoder, best latency configuration
#    "${riva_ngc_org}/${riva_ngc_team}/rmir_asr_jasper_english_streaming:${riva_ngc_model_version}"

### Jasper Streaming w/ CPU decoder, best throughput configuration
#    "${riva_ngc_org}/${riva_ngc_team}/rmir_asr_jasper_english_streaming_throughput:${riva_ngc_model_version}"

###  Jasper Offline w/ CPU decoder
#    "${riva_ngc_org}/${riva_ngc_team}/rmir_asr_jasper_english_offline:${riva_ngc_model_version}"
 
### Quarztnet Streaming w/ CPU decoder, best latency configuration
#    "${riva_ngc_org}/${riva_ngc_team}/rmir_asr_quartznet_english_streaming:${riva_ngc_model_version}"

### Quarztnet Streaming w/ CPU decoder, best throughput configuration
#    "${riva_ngc_org}/${riva_ngc_team}/rmir_asr_quartznet_english_streaming_throughput:${riva_ngc_model_version}"

### Quarztnet Offline w/ CPU decoder
#    "${riva_ngc_org}/${riva_ngc_team}/rmir_asr_quartznet_english_offline:${riva_ngc_model_version}"

### Jasper Streaming w/ GPU decoder, best latency configuration
#    "${riva_ngc_org}/${riva_ngc_team}/rmir_asr_jasper_english_streaming_gpu_decoder:${riva_ngc_model_version}"

### Jasper Streaming w/ GPU decoder, best throughput configuration
#    "${riva_ngc_org}/${riva_ngc_team}/rmir_asr_jasper_english_streaming_throughput_gpu_decoder:${riva_ngc_model_version}"

### Jasper Offline w/ GPU decoder
#    "${riva_ngc_org}/${riva_ngc_team}/rmir_asr_jasper_english_offline_gpu_decoder:${riva_ngc_model_version}"
)

########## NLP MODELS ##########

models_nlp=(
### Bert base Punctuation model
    "${riva_ngc_org}/${riva_ngc_team}/rmir_nlp_punctuation_bert_base:${riva_ngc_model_version}"

### BERT base Named Entity Recognition model fine-tuned on GMB dataset with class labels LOC, PER, ORG etc.
    "${riva_ngc_org}/${riva_ngc_team}/rmir_nlp_named_entity_recognition_bert_base:${riva_ngc_model_version}"

### BERT Base Intent Slot model fine-tuned on weather dataset.
    "${riva_ngc_org}/${riva_ngc_team}/rmir_nlp_intent_slot_bert_base:${riva_ngc_model_version}"

### BERT Base Question Answering model fine-tuned on Squad v2.
    "${riva_ngc_org}/${riva_ngc_team}/rmir_nlp_question_answering_bert_base:${riva_ngc_model_version}"

### Megatron345M Question Answering model fine-tuned on Squad v2.
#    "${riva_ngc_org}/${riva_ngc_team}/rmir_nlp_question_answering_megatron:${riva_ngc_model_version}"

### Bert base Text Classification model fine-tuned on 4class (weather, meteorology, personality, nomatch) domain model.
    "${riva_ngc_org}/${riva_ngc_team}/rmir_nlp_text_classification_bert_base:${riva_ngc_model_version}"
)

########## TTS MODELS ##########

models_tts=(
   "${riva_ngc_org}/${riva_ngc_team}/rmir_tts_fastpitch_hifigan_ljspeech:${riva_ngc_model_version}"
#   "${riva_ngc_org}/${riva_ngc_team}/rmir_tts_tacotron_waveglow_ljspeech:${riva_ngc_model_version}"
)

NGC_TARGET=${riva_ngc_org}
if [[ ! -z ${riva_ngc_team} ]]; then
  NGC_TARGET="${NGC_TARGET}/${riva_ngc_team}"
else
  team="\"\""
fi

# define docker images required to run Riva
image_client="nvcr.io/${NGC_TARGET}/riva-speech-client:${riva_ngc_image_version}"
image_speech_api="nvcr.io/${NGC_TARGET}/riva-speech:${riva_ngc_image_version}-server"

# define docker images required to setup Riva
image_init_speech="nvcr.io/${NGC_TARGET}/riva-speech:${riva_ngc_image_version}-servicemaker"

# daemon names
riva_daemon_speech="riva-speech"
riva_daemon_client="riva-client"

riva_init.sh

#!/bin/bash
# Copyright (c) 2021, NVIDIA CORPORATION.  All rights reserved.
#
# NVIDIA CORPORATION and its licensors retain all intellectual property
# and proprietary rights in and to this software, related documentation
# and any modifications thereto.  Any use, reproduction, disclosure or
# distribution of this software and related documentation without an express
# license agreement from NVIDIA CORPORATION is strictly prohibited.


get_ngc_key_from_environment() {
    # first check the global NGC_API_KEY environment variable
    local ngc_key=$NGC_API_KEY
    # if env variable was not set, and a ~/.ngc/config exists
    # try to get it from there
    if [ -z "$ngc_key" ] && [[ -f "$HOME/.ngc/config" ]]
    then
        ngc_key=$(cat $HOME/.ngc/config | grep -m 1 -G "^\s*apikey\s*=.*" | sed 's/^\s*apikey\s*=\s*//g')
    fi
    echo $ngc_key
}

docker_pull() {
    image_exists=$(docker images --filter=reference=$1 -q | wc -l)
    if [[ $image_exists -eq 1 ]]; then
        echo "  > Image $1 exists. Skipping."
        return
    fi
    attempts=3
    echo "  > Pulling $1. This may take some time..."
    for ((i = 1 ; i <= $attempts ; i++)); do
        docker pull -q $1 &> /dev/null
        if [ $? -ne 0 ]; then
            echo "  > Attempt $i out of $attempts failed"
            if [ $i -eq $attempts ]; then
                echo "Error occurred pulling '$1'."
                docker pull $1
                echo "Exiting."
                exit 1
            else
                echo "  > Trying again..."
                continue
            fi
        else
            break
        fi
    done
}

check_docker_version() {
    version_string=$(docker version --format '{{.Server.Version}}')
    if [ $? -ne 0 ]; then
        echo "Unable to run Docker. Please check that Docker is installed and functioning."
        exit 1
    fi
    maj_ver=$(echo $version_string | awk -F. '{print $1}')
    min_ver=$(echo $version_string | awk -F. '{print $2}')
    if [ "$maj_ver" -lt "19" ] || ([ "$maj_ver" -eq "19" ] && [ "$min_ver" -lt "03" ]); then
        echo "Docker version insufficient. Please use Docker 19.03 or later"
        exit 1;
    fi
}

# BEGIN SCRIPT
#check_docker_version

# load config file
script_path="$( cd "$(dirname "$0")" >/dev/null 2>&1 ; pwd -P )"
if [ -z "$1" ]; then
    config_path="${script_path}/config.sh"
else
    config_path=$(readlink -f $1)
fi

if [[ ! -f $config_path ]]; then
    echo 'Unable to load configuration file. Override path to file with -c argument.'
    exit 1
fi
source $config_path || exit 1

# automatically get NGC_API_KEY or request from user if necessary
#NGC_API_KEY="$(get_ngc_key_from_environment)"
#if [ -z "$NGC_API_KEY" ]; then
#    read -sp 'Please enter API key for ngc.nvidia.com: ' NGC_API_KEY
#    echo
#fi

# use the API key to run docker login for the NGC registry
# exit early if the key is invalid, because we won't be able to do anything
#echo "Logging into NGC docker registry if necessary..."
#echo $NGC_API_KEY | docker login -u '$oauthtoken' --password-stdin nvcr.io &> /dev/null
#if [ $? -ne 0 ]; then
#    echo 'NGC API Key is invalid. Please check and try again.'
#    exit 1
#fi

# pull all the requisite images we're going to need
echo "Pulling required docker images if necessary..."
echo "Note: This may take some time, depending on the speed of your Internet connection."
# pull the speech server if any of asr/nlp/tts services are requested
if [ "$service_enabled_asr" = true ] || [ "$service_enabled_nlp" = true ] || [ "$service_enabled_tts" = true ]; then
    echo "> Pulling Riva Speech Server images."
    docker_pull $image_speech_api
    docker_pull $image_client
    docker_pull $image_init_speech
fi


if [ "$use_existing_rmirs" = false ]; then
    echo
    echo "Downloading models (RMIRs) from NGC..."
    echo "Note: this may take some time, depending on the speed of your Internet connection."
    echo "To skip this process and use existing RMIRs set the location and corresponding flag in config.sh."

    # build up commands to download from NGC
    if [ "$service_enabled_asr" = true ] || [ "$service_enabled_nlp" = true ] || [ "$service_enabled_tts" = true ]; then
        gmr_speech_models=""
        if [ "$service_enabled_asr" = true ]; then
            for model in ${models_asr[@]}; do
                gmr_speech_models+=" $model"
            done
        fi
        if [ "$service_enabled_nlp" = true ]; then
            for model in ${models_nlp[@]}; do
                gmr_speech_models+=" $model"
            done
        fi
        if [ "$service_enabled_tts" = true ]; then
            for model in ${models_tts[@]}; do
                gmr_speech_models+=" $model"
            done
        fi

        # download required images
        docker run --init -it --rm --gpus '"'$gpus_to_use'"'  \
          -v $riva_model_loc:/data \
          -e "NGC_CLI_API_KEY=$NGC_API_KEY" \
          -e "NGC_CLI_ORG=nvidia" \
          --name riva-service-maker \
          $image_init_speech download_ngc_models $gmr_speech_models

        if [ $? -ne 0 ]; then
            echo "Error in downloading models."
            exit 1
        fi
    fi
fi

# convert all rmirs
echo
echo "Converting RMIRs at $riva_model_loc/rmir to Riva Model repository."

set -x
docker run --init -it --rm --gpus '"'$gpus_to_use'"' \
  -v $riva_model_loc:/data \
  -e "MODEL_DEPLOY_KEY=${MODEL_DEPLOY_KEY}" \
          --name riva-service-maker \
  $image_init_speech deploy_all_models /data/rmir /data/models

echo
echo "Riva initialization complete. Run ./riva_start.sh to launch services."

Hi,

Please refer following doc for more details.
https://docs.nvidia.com/deeplearning/riva/user-guide/docs/custom-model-deployment.html?highlight=model_loc#option-1-using-quick-start-scripts-to-deploy-your-models-recommended-path

Thank you.

Thanks, @spolisetty
I’m planning to host Riva on another server with internet to ease the setup process, however, the GPU spec is

NVIDIA Tesla T4 16GB x1

I’m wondering whether this kind of GPU is powerful enough to host Riva.

According to the support matrix, only the following GPU is supporting Riva,

  • NVIDIA A100
  • NVIDIA Volta V100
  • NVIDIA Turing T4

Is NVIDIA Turing T4 as same as NVIDIA Tesla T4?

I am having the same issue, as I run:

/riva_quickstart_v2.3.0# bash riva_start.sh

…then:

/riva_quickstart_v2.3.0# docker logs riva-speech

I0818 17:06:19.992784 105 model_repository_manager.cc:1132] successfully unloaded ‘citrinet-1024-en-US-asr-offline-ctc-decoder-cpu-offline’ version 1

Riva waiting for Triton server to load all models…retrying in 1 second

I0818 17:06:20.434286 105 server.cc:267] Timeout 29: Found 0 live models and 0 in-flight non-inference requests

error: creating server: Internal - failed to load all models

W0818 17:06:20.453564 105 metrics.cc:401] Unable to get power limit for GPU 0. Status:Success, value:0.000000

W0818 17:06:20.453699 105 metrics.cc:419] Unable to get power usage for GPU 0. Status:Success, value:0.000000

W0818 17:06:20.453771 105 metrics.cc:443] Unable to get energy consumption for GPU 0. Status:Success, value:0

Riva waiting for Triton server to load all models…retrying in 1 second

W0818 17:06:21.453974 105 metrics.cc:401] Unable to get power limit for GPU 0. Status:Success, value:0.000000

W0818 17:06:21.454103 105 metrics.cc:419] Unable to get power usage for GPU 0. Status:Success, value:0.000000

W0818 17:06:21.454173 105 metrics.cc:443] Unable to get energy consumption for GPU 0. Status:Success, value:0

Riva waiting for Triton server to load all models…retrying in 1 second

Triton server died before reaching ready state. Terminating Riva startup.

Check Triton logs with: docker logs

/opt/riva/bin/start-riva: line 1: kill: (105) - No such process

My config.sh:



# Copyright (c) 2022, NVIDIA CORPORATION.  All rights reserved.
#
# NVIDIA CORPORATION and its licensors retain all intellectual property
# and proprietary rights in and to this software, related documentation
# and any modifications thereto.  Any use, reproduction, disclosure or
# distribution of this software and related documentation without an express
# license agreement from NVIDIA CORPORATION is strictly prohibited.

# Architecture of target platform. Supported architectures: amd64, arm64
riva_target_arch="amd64"

# Legacy arm64 platform to be enabled. Supported legacy platforms: xavier
riva_arm64_legacy_platform=""

# Enable or Disable Riva Services
service_enabled_asr=true
service_enabled_nlp=false
service_enabled_tts=false

# Enable Riva Enterprise
# If enrolled in Enterprise, enable Riva Enterprise by setting configuration
# here. You must explicitly acknowledge you have read and agree to the EULA.
# RIVA_API_KEY=<ngc api key>
# RIVA_API_NGC_ORG=<ngc organization>
# RIVA_EULA=accept

# Language code to fetch models of a specify language
# Currently only ASR supports languages other than English
# Supported language codes: en-US, de-DE, es-US, ru-RU, zh-CN, hi-IN
# for any language other than English, set service_enabled_nlp and service_enabled_tts to False
# for multiple languages enter space separated language codes.
language_code=("en-US")

# Specify one or more GPUs to use
# specifying more than one GPU is currently an experimental feature, and may result in undefined behaviours.
gpus_to_use="device=0"

# Specify the encryption key to use to deploy models
MODEL_DEPLOY_KEY="tlt_encode"

# Locations to use for storing models artifacts
#
# If an absolute path is specified, the data will be written to that location
# Otherwise, a docker volume will be used (default).
#
# riva_init.sh will create a `rmir` and `models` directory in the volume or
# path specified.
#
# RMIR ($riva_model_loc/rmir)
# Riva uses an intermediate representation (RMIR) for models
# that are ready to deploy but not yet fully optimized for deployment. Pretrained
# versions can be obtained from NGC (by specifying NGC models below) and will be
# downloaded to $riva_model_loc/rmir by `riva_init.sh`
#
# Custom models produced by NeMo or TLT and prepared using riva-build
# may also be copied manually to this location $(riva_model_loc/rmir).
#
# Models ($riva_model_loc/models)
# During the riva_init process, the RMIR files in $riva_model_loc/rmir
# are inspected and optimized for deployment. The optimized versions are
# stored in $riva_model_loc/models. The riva server exclusively uses these
# optimized versions.
riva_model_loc="riva-model-repo"

if [[ $riva_target_arch == "arm64" ]]; then
    riva_model_loc="`pwd`/model_repository"
fi

# The default RMIRs are downloaded from NGC by default in the above $riva_rmir_loc directory
# If you'd like to skip the download from NGC and use the existing RMIRs in the $riva_rmir_loc
# then set the below $use_existing_rmirs flag to true. You can also deploy your set of custom
# RMIRs by keeping them in the riva_rmir_loc dir and use this quickstart script with the
# below flag to deploy them all together.
use_existing_rmirs=false

# Ports to expose for Riva services
riva_speech_api_port="50051"

# NGC orgs
riva_ngc_org="nvidia"
riva_ngc_team="riva"
riva_ngc_image_version="2.3.0"
riva_ngc_model_version="2.3.0"

# Pre-built models listed below will be downloaded from NGC. If models already exist in $riva-rmir
# then models can be commented out to skip download from NGC

########## ASR MODELS ##########

models_asr=()

### Citrinet-1024 models
for lang_code in ${language_code[@]}; do
    modified_lang_code="${lang_code/-/_}"
    modified_lang_code=${modified_lang_code,,}
    if [[ $riva_target_arch == "arm64" ]]; then
      models_asr+=(
      ### Citrinet-1024 Streaming w/ CPU decoder, best latency configuration
          "${riva_ngc_org}/${riva_ngc_team}/models_asr_citrinet_1024_${modified_lang_code}_str:${riva_ngc_model_version}-${riva_target_arch}${riva_arm64_legacy_platform}"
      )
    else
      models_asr+=(
      ### Citrinet-1024 Streaming w/ CPU decoder, best latency configuration
          "${riva_ngc_org}/${riva_ngc_team}/rmir_asr_citrinet_1024_${modified_lang_code}_str:${riva_ngc_model_version}"

      ### Citrinet-1024 Streaming w/ CPU decoder, best throughput configuration
      #    "${riva_ngc_org}/${riva_ngc_team}/rmir_asr_citrinet_1024_${modified_lang_code}_str_thr:${riva_ngc_model_version}"

      ### Citrinet-1024 Offline w/ CPU decoder,
          "${riva_ngc_org}/${riva_ngc_team}/rmir_asr_citrinet_1024_${modified_lang_code}_ofl:${riva_ngc_model_version}"
      )
    fi

    ### Punctuation model
    if [[ "${lang_code}"  == "en-US" || "${lang_code}" == "de-DE" || "${lang_code}" == "es-US" || "${lang_code}" == "zh-CN" ]]; then
      if [[ $riva_target_arch == "arm64" ]]; then
        models_asr+=(
            "${riva_ngc_org}/${riva_ngc_team}/models_nlp_punctuation_bert_base_${modified_lang_code}:${riva_ngc_model_version}-${riva_target_arch}${riva_arm64_legacy_platform}"
        )
      else
        models_asr+=(
            "${riva_ngc_org}/${riva_ngc_team}/rmir_nlp_punctuation_bert_base_${modified_lang_code}:${riva_ngc_model_version}"
        )
      fi
    fi

done

#Other ASR models
if [[ $riva_target_arch == "arm64" ]]; then
  models_asr+=(
  ### Conformer acoustic model, CPU decoder, streaming best latency configuration
  #    "${riva_ngc_org}/${riva_ngc_team}/models_asr_conformer_en_us_str:${riva_ngc_model_version}-${riva_target_arch}${riva_arm64_legacy_platform}"

  ### German Conformer acoustic model, CPU decoder, streaming best latency configuration
  #    "${riva_ngc_org}/${riva_ngc_team}/models_asr_conformer_de_de_str:${riva_ngc_model_version}-${riva_target_arch}${riva_arm64_legacy_platform}"

  ### Spanish Conformer acoustic model, CPU decoder, streaming best latency configuration
  #    "${riva_ngc_org}/${riva_ngc_team}/models_asr_conformer_es_us_str:${riva_ngc_model_version}-${riva_target_arch}${riva_arm64_legacy_platform}"

  ### Hindi Conformer acoustic model, CPU decoder, streaming best latency configuration
  #    "${riva_ngc_org}/${riva_ngc_team}/models_asr_conformer_hi_in_str:${riva_ngc_model_version}-${riva_target_arch}${riva_arm64_legacy_platform}"

  ### Russian Conformer acoustic model, CPU decoder, streaming best latency configuration
  #    "${riva_ngc_org}/${riva_ngc_team}/models_asr_conformer_ru_ru_str:${riva_ngc_model_version}-${riva_target_arch}${riva_arm64_legacy_platform}"

  ### Citrinet-256 Streaming w/ CPU decoder, best latency configuration
  #    "${riva_ngc_org}/${riva_ngc_team}/models_asr_citrinet_256_en_us_streaming:${riva_ngc_model_version}-${riva_target_arch}${riva_arm64_legacy_platform}"
  )
else
  models_asr+=(
  ### Conformer acoustic model, CPU decoder, streaming best latency configuration
  #    "${riva_ngc_org}/${riva_ngc_team}/rmir_asr_conformer_en_us_str:${riva_ngc_model_version}"

  ### Conformer acoustic model, CPU decoder, streaming best throughput configuration
  #    "${riva_ngc_org}/${riva_ngc_team}/rmir_asr_conformer_en_us_str_thr:${riva_ngc_model_version}"

  ### Conformer acoustic model, CPU decoder, offline configuration
  #    "${riva_ngc_org}/${riva_ngc_team}/rmir_asr_conformer_en_us_ofl:${riva_ngc_model_version}"

  ### German Conformer acoustic model, CPU decoder, streaming best latency configuration
  #    "${riva_ngc_org}/${riva_ngc_team}/rmir_asr_conformer_de_de_str:${riva_ngc_model_version}"

  ### German Conformer acoustic model, CPU decoder, streaming best throughput configuration
  #    "${riva_ngc_org}/${riva_ngc_team}/rmir_asr_conformer_de_de_str_thr:${riva_ngc_model_version}"

  ### German Conformer acoustic model, CPU decoder, offline configuration
  #    "${riva_ngc_org}/${riva_ngc_team}/rmir_asr_conformer_de_de_ofl:${riva_ngc_model_version}"

  ### Spanish Conformer acoustic model, CPU decoder, streaming best latency configuration
  #    "${riva_ngc_org}/${riva_ngc_team}/rmir_asr_conformer_es_us_str:${riva_ngc_model_version}"

  ### Spanish Conformer acoustic model, CPU decoder, streaming best throughput configuration
  #    "${riva_ngc_org}/${riva_ngc_team}/rmir_asr_conformer_es_us_str_thr:${riva_ngc_model_version}"

  ### Spanish Conformer acoustic model, CPU decoder, offline configuration
  #    "${riva_ngc_org}/${riva_ngc_team}/rmir_asr_conformer_es_us_ofl:${riva_ngc_model_version}"

  ### Hindi Conformer acoustic model, CPU decoder, streaming best latency configuration
  #    "${riva_ngc_org}/${riva_ngc_team}/rmir_asr_conformer_hi_in_str:${riva_ngc_model_version}"

  ### Hindi Conformer acoustic model, CPU decoder, streaming best throughput configuration
  #    "${riva_ngc_org}/${riva_ngc_team}/rmir_asr_conformer_hi_in_str_thr:${riva_ngc_model_version}"

  ### Hindi Conformer acoustic model, CPU decoder, offline configuration
  #    "${riva_ngc_org}/${riva_ngc_team}/rmir_asr_conformer_hi_in_ofl:${riva_ngc_model_version}"
  
  ### Russian Conformer acoustic model, CPU decoder, streaming best latency configuration
  #    "${riva_ngc_org}/${riva_ngc_team}/rmir_asr_conformer_ru_ru_str:${riva_ngc_model_version}"

  ### Russian Conformer acoustic model, CPU decoder, streaming best throughput configuration
  #    "${riva_ngc_org}/${riva_ngc_team}/rmir_asr_conformer_ru_ru_str_thr:${riva_ngc_model_version}"

  ### Russian Conformer acoustic model, CPU decoder, offline configuration
  #    "${riva_ngc_org}/${riva_ngc_team}/rmir_asr_conformer_ru_ru_ofl:${riva_ngc_model_version}"

  ### Jasper Streaming w/ CPU decoder, best latency configuration
  #    "${riva_ngc_org}/${riva_ngc_team}/rmir_asr_jasper_en_us_str:${riva_ngc_model_version}"

  ### Jasper Streaming w/ CPU decoder, best throughput configuration
  #    "${riva_ngc_org}/${riva_ngc_team}/rmir_asr_jasper_en_us_str_thr:${riva_ngc_model_version}"

  ###  Jasper Offline w/ CPU decoder
  #    "${riva_ngc_org}/${riva_ngc_team}/rmir_asr_jasper_en_us_ofl:${riva_ngc_model_version}"

  ### Quarztnet Streaming w/ CPU decoder, best latency configuration
  #    "${riva_ngc_org}/${riva_ngc_team}/rmir_asr_quartznet_en_us_str:${riva_ngc_model_version}"

  ### Quarztnet Streaming w/ CPU decoder, best throughput configuration
  #    "${riva_ngc_org}/${riva_ngc_team}/rmir_asr_quartznet_en_us_str_thr:${riva_ngc_model_version}"

  ### Quarztnet Offline w/ CPU decoder
  #    "${riva_ngc_org}/${riva_ngc_team}/rmir_asr_quartznet_en_us_ofl:${riva_ngc_model_version}"

  ### Jasper Streaming w/ GPU decoder, best latency configuration
  #    "${riva_ngc_org}/${riva_ngc_team}/rmir_asr_jasper_en_us_str_gpu_decoder:${riva_ngc_model_version}"

  ### Jasper Streaming w/ GPU decoder, best throughput configuration
  #    "${riva_ngc_org}/${riva_ngc_team}/rmir_asr_jasper_en_us_str_thr_gpu_decoder:${riva_ngc_model_version}"

  ### Jasper Offline w/ GPU decoder
  #    "${riva_ngc_org}/${riva_ngc_team}/rmir_asr_jasper_en_us_ofl_gpu_decoder:${riva_ngc_model_version}"
  )
fi

########## NLP MODELS ##########

if [[ $riva_target_arch == "arm64" ]]; then
  models_nlp=(
  ### BERT Base Intent Slot model for misty domain fine-tuned on weather, smalltalk/personality, poi/map datasets.
      "${riva_ngc_org}/${riva_ngc_team}/models_nlp_intent_slot_misty_bert_base:${riva_ngc_model_version}-${riva_target_arch}${riva_arm64_legacy_platform}"

  ### DistilBERT Intent Slot model for misty domain fine-tuned on weather, smalltalk/personality, poi/map datasets.
  #    "${riva_ngc_org}/${riva_ngc_team}/models_nlp_intent_slot_misty_distilbert:${riva_ngc_model_version}-${riva_target_arch}${riva_arm64_legacy_platform}"
  )
else
  models_nlp=(
  ### Bert base Punctuation model
      "${riva_ngc_org}/${riva_ngc_team}/rmir_nlp_punctuation_bert_base_en_us:${riva_ngc_model_version}"

  ### BERT base Named Entity Recognition model fine-tuned on GMB dataset with class labels LOC, PER, ORG etc.
      "${riva_ngc_org}/${riva_ngc_team}/rmir_nlp_named_entity_recognition_bert_base:${riva_ngc_model_version}"

  ### BERT Base Intent Slot model fine-tuned on weather dataset.
      "${riva_ngc_org}/${riva_ngc_team}/rmir_nlp_intent_slot_bert_base:${riva_ngc_model_version}"

  ### BERT Base Question Answering model fine-tuned on Squad v2.
      "${riva_ngc_org}/${riva_ngc_team}/rmir_nlp_question_answering_bert_base:${riva_ngc_model_version}"

  ### Megatron345M Question Answering model fine-tuned on Squad v2.
  #    "${riva_ngc_org}/${riva_ngc_team}/rmir_nlp_question_answering_megatron:${riva_ngc_model_version}"

  ### Bert base Text Classification model fine-tuned on 4class (weather, meteorology, personality, nomatch) domain model.
      "${riva_ngc_org}/${riva_ngc_team}/rmir_nlp_text_classification_bert_base:${riva_ngc_model_version}"
  )
fi

########## TTS MODELS ##########

if [[ $riva_target_arch == "arm64" ]]; then
  models_tts=(
     "${riva_ngc_org}/${riva_ngc_team}/models_tts_fastpitch_hifigan_en_us_female_1:${riva_ngc_model_version}-${riva_target_arch}${riva_arm64_legacy_platform}"
  #   "${riva_ngc_org}/${riva_ngc_team}/models_tts_fastpitch_hifigan_en_us_male_1:${riva_ngc_model_version}-${riva_target_arch}${riva_arm64_legacy_platform}"
  )
else
  models_tts=(
     "${riva_ngc_org}/${riva_ngc_team}/rmir_tts_fastpitch_hifigan_en_us_female_1:${riva_ngc_model_version}"
  #   "${riva_ngc_org}/${riva_ngc_team}/rmir_tts_fastpitch_hifigan_en_us_male_1:${riva_ngc_model_version}"
  )
fi

NGC_TARGET=${riva_ngc_org}
if [[ ! -z ${riva_ngc_team} ]]; then
  NGC_TARGET="${NGC_TARGET}/${riva_ngc_team}"
else
  team="\"\""
fi

# Specify paths to SSL Key and Certificate files to use TLS/SSL Credentials for a secured connection.
# If either are empty, an insecure connection will be used.
# Stored within container at /ssl/servert.crt and /ssl/server.key
# Optional, one can also specify a root certificate, stored within container at /ssl/root_server.crt
ssl_server_cert=""
ssl_server_key=""
ssl_root_cert=""

# define docker images required to run Riva
image_client="nvcr.io/${NGC_TARGET}/riva-speech-client:${riva_ngc_image_version}"
image_speech_api="nvcr.io/${NGC_TARGET}/riva-speech:${riva_ngc_image_version}-server"

# define docker images required to setup Riva
image_init_speech="nvcr.io/${NGC_TARGET}/riva-speech:${riva_ngc_image_version}-servicemaker"

# daemon names
riva_daemon_speech="riva-speech"
if [[ $riva_target_arch != "arm64" ]]; then
    riva_daemon_client="riva-client"
fi

As other posts suggested, I ran clean-up:

bash riva_clean.sh

results via docker logs riva-speech:

error: creating server: Invalid argument - --model-repository must be specified

…as follows:

Environment Details:
Ubuntu 22.04.1 LTS
NVIDIA Driver Version: 510.73.08 (GPU A100)
CUDA Version: 11.6
Docker: 20.10.17

Any thoughts?

Thank you.