Hardware - GPU: A5000
Hardware - CPU: AMD
Operating System: Debian (riva-api docker image)
Riva Version: 2.7.0
I have many model configurations that I want to test and load/unload as needed and which won’t fit on a single GPU.
If my Triton server with multiple models available on disk are not all loaded,
riva_server fails with the following error:
E1222 18:45:51.512120 182 model_registry.cc:102] error: cannot get model config Request for unknown model: 'mymodelname-streaming' is not found
mymodelname refers to the model name of course.
If I try to load then unload all models so that their
state is at least set properly, I get
error: cannot get model config Request for unknown model: 'mymodelname-streaming' version 1 is not at ready state
Is it not possible to use riva without loading all the models?
Apologies for the delay,
Can you please share your
docker logs riva-speech complete output
Will check further with the team on this request
I am not using the Riva quickstart setup (i.e.
config.sh). I have my own setup which launches Riva and Triton from the docker container since we build our models ourselves.
There is nothing special about the logs, no warnings or errors, just the typical Triton startup logs followed by Riva failing with the above error.
Ex. startup logs:
I0110 12:51:30.278211 111 logging.cc:49] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +327, now: CPU 2, GPU 2212 (MiB)
W0110 12:51:30.278233 111 logging.cc:46] CUDA lazy loading is not enabled. Enabling it can significantly reduce device memory usage. See `CUDA_MODULE_LOADING` in https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#env-vars
I0110 12:51:30.722238 111 tensorrt.cc:1547] Created instance mymodelname-streaming-am-streaming_0 on GPU 0 with stream priority 0 and optimization profile default;
I0110 12:51:30.722291 111 endpointing_library.cc:22] TRITONBACKEND_ModelInstanceInitialize: mymodelname-streaming-endpointing-streaming_0 (device 0)
I0110 12:51:30.759757 111 model_lifecycle.cc:693] successfully loaded 'mymodelname-streaming-am-streaming' version 1
I0110 12:51:30.777475 111 ctc-decoder-library.cc:23] TRITONBACKEND_ModelInstanceInitialize: mymodelname-streaming-ctc-decoder-cpu-streaming_0 (device 0)
I0110 12:51:30.777575 129 ctc-decoder.cc:174] Beam Decoder initialized successfully!
I0110 12:51:30.777602 111 feature-extractor.cc:402] TRITONBACKEND_ModelInstanceInitialize: mymodelname-streaming-feature-extractor-streaming_0 (device 0)
I0110 12:51:30.777780 111 model_lifecycle.cc:693] successfully loaded 'mymodelname-streaming-endpointing-streaming' version 1
I0110 12:51:30.778117 111 model_lifecycle.cc:693] successfully loaded 'mymodelname-streaming-ctc-decoder-cpu-streaming' version 1
I0110 12:51:30.782237 111 model_lifecycle.cc:693] successfully loaded 'mymodelname-streaming-feature-extractor-streaming' version 1
I0110 12:51:30.782509 111 model_lifecycle.cc:459] loading: mymodelname-streaming:1
I0110 12:51:30.782726 111 model_lifecycle.cc:693] successfully loaded 'mymodelname-streaming' version 1
Can you please share the details steps on how you launch Riva and Triton from the docker container
Please do share the complete
docker logs riva-speech
First I start Triton:
tritonserver --disable-auto-complete-config \
--model-repository /data/models \
I then run a script (prepare_triton.py - Pastebin.com) to load the models I want into Triton.
python3 prepare_triton.py --model_names "$MODEL_NAMES"
MODEL_NAMES is an environment variables that’s a comma delimited string of model names to load.
Then I run Riva
riva_server --triton_uri=localhost:8001 "$@" &
This all shouldn’t matter though. I just need a yes/no answer to my original question: Is it possible to use Riva without loading all the models?
Apologies for the delay, will surely check with the team on this and get back
I have inputs from the internal team
- Running triton directly is not supported currently.
- Riva server doesn’t support --model-control-mode=explicit which user is using. In riva, --model-control-mode is NONE.
- riva_server will load everything in data/models.
- User should use setting up different docker volumes for their configs when wanting to load particular models.
During the riva_init process, the RMIR files in $riva_model_loc/rmir
are inspected and optimized for deployment. The optimized versions are
stored in $riva_model_loc/models. The riva server exclusively uses these
riva_model_loc controls the name of the volume or if preceded by a / it uses a hostpath.
docker volumes will also help in tackling memory related challenge.
with docker volumes, you can divide your models however you prefer.
- Best advice is to follow the guidelines in quickstart guide and use riva scripts provided.