Riva_server fails if not all Triton models are loaded

pineapple9011 · December 22, 2022, 7:01pm

Hardware - GPU: A5000
Hardware - CPU: AMD
Operating System: Debian (riva-api docker image)
Riva Version: 2.7.0

I have many model configurations that I want to test and load/unload as needed and which won’t fit on a single GPU.
If my Triton server with multiple models available on disk are not all loaded, riva_server fails with the following error:

E1222 18:45:51.512120   182 model_registry.cc:102] error: cannot get model config Request for unknown model: 'mymodelname-streaming' is not found

Where mymodelname refers to the model name of course.

If I try to load then unload all models so that their state is at least set properly, I get

error: cannot get model config Request for unknown model: 'mymodelname-streaming' version 1 is not at ready state

Is it not possible to use riva without loading all the models?

rvinobha · December 29, 2022, 1:29pm

Hi @pineapple9011

Apologies for the delay,

Can you please share your

config.sh used
docker logs riva-speech complete output

Will check further with the team on this request

Thanks

pineapple9011 · January 10, 2023, 1:00pm

I am not using the Riva quickstart setup (i.e. config.sh). I have my own setup which launches Riva and Triton from the docker container since we build our models ourselves.
There is nothing special about the logs, no warnings or errors, just the typical Triton startup logs followed by Riva failing with the above error.

Ex. startup logs:

...
I0110 12:51:30.278211 111 logging.cc:49] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +327, now: CPU 2, GPU 2212 (MiB)
W0110 12:51:30.278233 111 logging.cc:46] CUDA lazy loading is not enabled. Enabling it can significantly reduce device memory usage. See `CUDA_MODULE_LOADING` in https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#env-vars
I0110 12:51:30.722238 111 tensorrt.cc:1547] Created instance mymodelname-streaming-am-streaming_0 on GPU 0 with stream priority 0 and optimization profile default[0];
I0110 12:51:30.722291 111 endpointing_library.cc:22] TRITONBACKEND_ModelInstanceInitialize: mymodelname-streaming-endpointing-streaming_0 (device 0)
I0110 12:51:30.759757 111 model_lifecycle.cc:693] successfully loaded 'mymodelname-streaming-am-streaming' version 1
I0110 12:51:30.777475 111 ctc-decoder-library.cc:23] TRITONBACKEND_ModelInstanceInitialize: mymodelname-streaming-ctc-decoder-cpu-streaming_0 (device 0)
I0110 12:51:30.777575   129 ctc-decoder.cc:174] Beam Decoder initialized successfully!
I0110 12:51:30.777602 111 feature-extractor.cc:402] TRITONBACKEND_ModelInstanceInitialize: mymodelname-streaming-feature-extractor-streaming_0 (device 0)
I0110 12:51:30.777780 111 model_lifecycle.cc:693] successfully loaded 'mymodelname-streaming-endpointing-streaming' version 1
I0110 12:51:30.778117 111 model_lifecycle.cc:693] successfully loaded 'mymodelname-streaming-ctc-decoder-cpu-streaming' version 1
I0110 12:51:30.782237 111 model_lifecycle.cc:693] successfully loaded 'mymodelname-streaming-feature-extractor-streaming' version 1
I0110 12:51:30.782509 111 model_lifecycle.cc:459] loading: mymodelname-streaming:1
I0110 12:51:30.782726 111 model_lifecycle.cc:693] successfully loaded 'mymodelname-streaming' version 1
...

rvinobha · January 13, 2023, 6:19pm

Hi @pineapple9011

Can you please share the details steps on how you launch Riva and Triton from the docker container

Please do share the complete docker logs riva-speech

pineapple9011 · January 16, 2023, 6:18pm

First I start Triton:

tritonserver --disable-auto-complete-config \
  --strict-model-config=true \
  --model-repository /data/models \
  --model-control-mode=explicit \
  --cuda-memory-pool-byte-size=0:1000000000 &
triton_pid=$!
block_until_server_alive $triton_pid

I then run a script (prepare_triton.py - Pastebin.com) to load the models I want into Triton.

python3 prepare_triton.py --model_names "$MODEL_NAMES"

Where MODEL_NAMES is an environment variables that’s a comma delimited string of model names to load.

Then I run Riva

riva_server --triton_uri=localhost:8001 "$@" &

This all shouldn’t matter though. I just need a yes/no answer to my original question: Is it possible to use Riva without loading all the models?

rvinobha · January 17, 2023, 6:34am

Thanks @pineapple9011

Apologies for the delay, will surely check with the team on this and get back

Thanks

rvinobha · January 27, 2023, 6:23am

HI @pineapple9011

I have inputs from the internal team

Running triton directly is not supported currently.
Riva server doesn’t support --model-control-mode=explicit which user is using. In riva, --model-control-mode is NONE.
riva_server will load everything in data/models.
User should use setting up different docker volumes for their configs when wanting to load particular models.
In config.sh,

Models ($riva_model_loc/models)

During the riva_init process, the RMIR files in $riva_model_loc/rmir

are inspected and optimized for deployment. The optimized versions are

stored in $riva_model_loc/models. The riva server exclusively uses these

optimized versions.

riva_model_loc=“riva-model-repo”

riva_model_loc controls the name of the volume or if preceded by a / it uses a hostpath.

docker volumes will also help in tackling memory related challenge.
with docker volumes, you can divide your models however you prefer.

Best advice is to follow the guidelines in quickstart guide and use riva scripts provided.

Topic		Replies	Views
Triton server died before reaching ready state. Terminating Riva startup Riva inference-server-triton , riva	2	1166	October 18, 2024
Riva_start.sh will not load the models Riva riva	3	1308	April 23, 2024
Triton server died before reaching ready state. Terminating Riva startup Riva	15	8058	November 8, 2023
Riva_start.sh will not start the server Riva riva	5	1306	October 31, 2025
Riva start can't download all models on DGX spark Riva	2	149	November 20, 2025
Failed to get riva started Riva riva	7	1848	December 3, 2022
Triton server died before reaching ready state. Terminating Jarvis startup Riva riva	6	6761	August 20, 2021
Error when deploying Conformer ASR model using Riva Riva	0	532	January 24, 2024
Riva Speech Server Fails to Start Due to Model Loading Errors Riva conversational-ai , riva	3	285	January 27, 2025
Nvidia Riva health check fail Riva riva	1	531	February 14, 2025